Database Design
Database Design
No part of this publication may be reproduced, transmitted, or translated in any form or by any means, electronic, mechanical, manual, optical, or
otherwise, without the prior written permission of iAnywhere Solutions, Inc. iAnywhere Solutions, Inc. is a subsiduary of Sybase, Inc.
Sybase, SYBASE (logo), AccelaTrade, ADA Workbench, Adaptable Windowing Environment, Adaptive Component Architecture, Adaptive Server,
Adaptive Server Anywhere, Adaptive Server Enterprise, Adaptive Server Enterprise Monitor, Adaptive Server Enterprise Replication, Adaptive
Server Everywhere, Adaptive Server IQ, Adaptive Warehouse, AnswerBase, Anywhere Studio, Application Manager, AppModeler,
APT Workbench, APT-Build, APT-Edit, APT-Execute, APT-Library, APT-Translator, ASEP, AvantGo, AvantGo Application Alerts, AvantGo
Mobile Delivery, AvantGo Mobile Document Viewer, AvantGo Mobile Inspection, AvantGo Mobile Marketing Channel, AvantGo Mobile Pharma,
AvantGo Mobile Sales, AvantGo Pylon, AvantGo Pylon Application Server, AvantGo Pylon Conduit, AvantGo Pylon PIM Server, AvantGo
Pylon Pro, Backup Server, BayCam, Bit-Wise, BizTracker, Certified PowerBuilder Developer, Certified SYBASE Professional, Certified SYBASE
Professional (logo), ClearConnect, Client Services, Client-Library, CodeBank, Column Design, ComponentPack, Connection Manager,
Convoy/DM, Copernicus, CSP, Data Pipeline, Data Workbench, DataArchitect, Database Analyzer, DataExpress, DataServer, DataWindow,
DB-Library, dbQueue, Developers Workbench, Direct Connect Anywhere, DirectConnect, Distribution Director, Dynamic Mobility Model,
Dynamo, e-ADK, E-Anywhere, e-Biz Integrator, E-Whatever, EC Gateway, ECMAP, ECRTP, eFulfillment Accelerator, Electronic Case
Management, Embedded SQL, EMS, Enterprise Application Studio, Enterprise Client/Server, Enterprise Connect, Enterprise Data Studio,
Enterprise Manager, Enterprise Portal (logo), Enterprise SQL Server Manager, Enterprise Work Architecture, Enterprise Work Designer, Enterprise
Work Modeler, eProcurement Accelerator, eremote, Everything Works Better When Everything Works Together, EWA, Financial Fusion, Financial
Fusion (and design), Financial Fusion Server, Formula One, Fusion Powered e-Finance, Fusion Powered Financial Destinations, Fusion
Powered STP, Gateway Manager, GeoPoint, GlobalFIX, iAnywhere, iAnywhere Solutions, ImpactNow, Industry Warehouse Studio, InfoMaker,
Information Anywhere, Information Everywhere, InformationConnect, InstaHelp, Intelligent Self-Care, InternetBuilder, iremote, iScript,
Jaguar CTS, jConnect for JDBC, KnowledgeBase, Logical Memory Manager, M-Business Channel, M-Business Network, M-Business Server, Mail
Anywhere Studio, MainframeConnect, Maintenance Express, Manage Anywhere Studio, MAP, MDI Access Server, MDI Database Gateway,
media.splash, Message Anywhere Server, MetaWorks, MethodSet, ML Query, MobiCATS, My AvantGo, My AvantGo Media Channel,
My AvantGo Mobile Marketing, MySupport, Net-Gateway, Net-Library, New Era of Networks, Next Generation Learning, Next Generation
Learning Studio, O DEVICE, OASiS, OASiS (logo), ObjectConnect, ObjectCycle, OmniConnect, OmniSQL Access Module, OmniSQL Toolkit,
Open Biz, Open Business Interchange, Open Client, Open Client/Server, Open Client/Server Interfaces, Open ClientConnect, Open Gateway, Open
Server, Open ServerConnect, Open Solutions, Optima++, Orchestration Studio, Partnerships that Work, PB-Gen, PC APT Execute, PC DB-Net,
PC Net Library, PhysicalArchitect, Pocket PowerBuilder, PocketBuilder, Power Through Knowledge, Power++, power.stop, PowerAMC,
PowerBuilder, PowerBuilder Foundation Class Library, PowerDesigner, PowerDimensions, PowerDynamo, Powering the New Economy, PowerJ,
PowerScript, PowerSite, PowerSocket, Powersoft, Powersoft Portfolio, Powersoft Professional, PowerStage, PowerStudio, PowerTips, PowerWare
Desktop, PowerWare Enterprise, ProcessAnalyst, QAnywhere, Rapport, Relational Beans, RepConnector, Replication Agent, Replication Driver,
Replication Server, Replication Server Manager, Replication Toolkit, Report Workbench, Report-Execute, Resource Manager, RW-DisplayLib,
RW-Library, S.W.I.F.T. Message Format Libraries, SAFE, SAFE/PRO, SDF, Secure SQL Server, Secure SQL Toolset, Security Guardian, SKILS,
smart.partners, smart.parts, smart.script, SQL Advantage, SQL Anywhere, SQL Anywhere Studio, SQL Code Checker, SQL Debug, SQL Edit,
SQL Edit/TPU, SQL Everywhere, SQL Modeler, SQL Remote, SQL Server, SQL Server Manager, SQL Server SNMP SubAgent, SQL Server/CFT,
SQL Server/DBM, SQL SMART, SQL Station, SQL Toolset, SQLJ, Stage III Engineering, Startup.Com, STEP, SupportNow, Sybase Central,
Sybase Client/Server Interfaces, Sybase Development Framework, Sybase Financial Server, Sybase Gateways, Sybase Learning Connection,
Sybase MPP, Sybase SQL Desktop, Sybase SQL Lifecycle, Sybase SQL Workgroup, Sybase Synergy Program, Sybase User Workbench, Sybase
Virtual Server Architecture, SybaseWare, Syber Financial, SyberAssist, SybMD, SyBooks, System 10, System 11, System XI (logo), SystemTools,
Tabular Data Stream, The Enterprise Client/Server Company, The Extensible Software Platform, The Future Is Wide Open, The Learning
Connection, The Model For Client/Server Solutions, The Online Information Center, The Power of One, TotalFix, TradeForce, Transact-SQL,
Translation Toolkit, Turning Imagination Into Reality, UltraLite, UltraLite.NET, UNIBOM, Unilib, Uninull, Unisep, Unistring, URK Runtime Kit
for UniCode, Versacore, Viewer, VisualWriter, VQL, Warehouse Control Center, Warehouse Studio, Warehouse WORKS, WarehouseArchitect,
Watcom, Watcom SQL, Watcom SQL Server, Web Deployment Kit, Web.PB, Web.SQL, WebSights, WebViewer, WorkGroup SQL Server,
XA-Library, XA-Server, and XP Server are trademarks of Sybase, Inc. or its subsidiaries.
ii
Contents
iii
Particular concurrency issues . . . . . . . . . . . . . . . . . . 149
Replication and concurrency . . . . . . . . . . . . . . . . . . 152
Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
iv
Key joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
v
Physical data organization and access . . . . . . . . . . . . . 423
Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
Semantic query transformations . . . . . . . . . . . . . . . . 435
Subquery and function caching . . . . . . . . . . . . . . . . . 449
Reading access plans . . . . . . . . . . . . . . . . . . . . . . 451
vi
Migrating databases to Adaptive Server Anywhere . . . . . . 581
Running SQL command files . . . . . . . . . . . . . . . . . . 586
Adaptive Server Enterprise compatibility . . . . . . . . . . . . 589
vii
21 Debugging Logic in the Database 709
Introduction to debugging in the database . . . . . . . . . . . 710
Tutorial: Getting started with the debugger . . . . . . . . . . . 712
Working with breakpoints . . . . . . . . . . . . . . . . . . . . 721
Working with variables . . . . . . . . . . . . . . . . . . . . . . 724
Working with connections . . . . . . . . . . . . . . . . . . . . 725
Index 727
viii
About This Manual
Subject This book describes how to design and create databases; how to import,
export, and modify data; how to retrieve data; and how to build stored
procedures and triggers.
Audience This manual is for all users of Adaptive Server Anywhere.
Before you begin This manual assumes that you have an elementary familiarity with
database-management systems and Adaptive Server Anywhere in particular.
If you do not have such a familiarity, you should consider reading
Introducing SQL Anywhere Studio before reading this manual.
ix
SQL Anywhere Studio documentation
This book is part of the SQL Anywhere documentation set. This section
describes the books in the documentation set and how you can use them.
The SQL Anywhere The SQL Anywhere Studio documentation is available in a variety of forms:
Studio documentation in an online form that combines all books in one large help file; as separate
PDF files for each book; and as printed books that you can purchase. The
documentation consists of the following books:
♦ Introducing SQL Anywhere Studio This book provides an overview of
the SQL Anywhere Studio database management and synchronization
technologies. It includes tutorials to introduce you to each of the pieces
that make up SQL Anywhere Studio.
x
(Trusted Computer System Evaluation Criteria) C2 security rating from
the U.S. Government. This book may be of interest to those who wish to
run the current version of Adaptive Server Anywhere in a manner
equivalent to the C2-certified environment.
♦ MobiLink Synchronization User’s Guide This book describes how to
use the MobiLink data synchronization system for mobile computing,
which enables sharing of data between a single Oracle, Sybase, Microsoft
or IBM database and many Adaptive Server Anywhere or UltraLite
databases.
♦ MobiLink Synchronization Reference This book is a reference guide
to MobiLink command line options, synchronization scripts, SQL
statements, stored procedures, utilities, system tables, and error messages.
♦ MobiLink Server-Initiated Synchronization User’s Guide This book
describes MobiLink server-initiated synchronization, a feature of
MobiLink that allows you to initiate synchronization from the
consolidated database.
♦ QAnywhere User’s Guide This manual describes MobiLink
QAnywhere, a messaging platform that enables the development and
deployment of messaging applications for mobile and wireless clients, as
well as traditional desktop and laptop clients.
♦ iAnywhere Solutions ODBC Drivers This book describes how to set
up ODBC drivers to access consolidated databases other than Adaptive
Server Anywhere from the MobiLink synchronization server and from
Adaptive Server Anywhere remote data access.
♦ SQL Remote User’s Guide This book describes all aspects of the
SQL Remote data replication system for mobile computing, which
enables sharing of data between a single Adaptive Server Anywhere or
Adaptive Server Enterprise database and many Adaptive Server
Anywhere databases using an indirect link such as e-mail or file transfer.
♦ SQL Anywhere Studio Help This book includes the context-sensitive
help for Sybase Central, Interactive SQL, and other graphical tools. It is
not included in the printed documentation set.
♦ UltraLite Database User’s Guide This book is intended for all
UltraLite developers. It introduces the UltraLite database system and
provides information common to all UltraLite programming interfaces.
♦ UltraLite Interface Guides A separate book is provided for each
UltraLite programming interface. Some of these interfaces are provided
as UltraLite components for rapid application development, and others
are provided as static interfaces for C, C++, and Java development.
xi
In addition to this documentation set, PowerDesigner and InfoMaker include
their own online documentation.
Documentation formats SQL Anywhere Studio provides documentation in the following formats:
♦ Online documentation The online documentation contains the
complete SQL Anywhere Studio documentation, including both the
books and the context-sensitive help for SQL Anywhere tools. The online
documentation is updated with each maintenance release of the product,
and is the most complete and up-to-date source of documentation.
To access the online documentation on Windows operating systems,
choose Start ➤ Programs ➤ SQL Anywhere 9 ➤ Online Books. You can
navigate the online documentation using the HTML Help table of
contents, index, and search facility in the left pane, as well as using the
links and menus in the right pane.
To access the online documentation on UNIX operating systems, see the
HTML documentation under your SQL Anywhere installation.
♦ PDF books The SQL Anywhere books are provided as a set of PDF
files, viewable with Adobe Acrobat Reader.
The PDF books are accessible from the online books, or from the
Windows Start menu.
xii
Documentation conventions
This section lists the typographic and graphical conventions used in this
documentation.
Syntax conventions The following conventions are used in the SQL syntax descriptions:
♦ Keywords All SQL keywords appear in upper case, like the words
ALTER TABLE in the following example:
ALTER TABLE [ owner.]table-name
One or more list elements are allowed. In this example, if more than one
is specified, they must be separated by commas.
♦ Optional portions Optional portions of a statement are enclosed by
square brackets.
RELEASE SAVEPOINT [ savepoint-name ]
For example, you can choose one of ASC, DESC, or neither. The square
brackets should not be typed.
♦ Alternatives When precisely one of the options must be chosen, the
alternatives are enclosed in curly braces and a bar is used to separate the
options.
[ QUOTES { ON | OFF } ]
xiii
Graphic icons The following icons are used in this documentation.
♦ A client application.
♦ A programming interface.
API
xiv
The Adaptive Server Anywhere sample database
Many of the examples throughout the documentation use the Adaptive
Server Anywhere sample database.
The sample database is held in a file named asademo.db, and is located in
your SQL Anywhere directory.
The sample database represents a small company. It contains internal
information about the company (employees, departments, and finances) as
well as product information and sales information (sales orders, customers,
and contacts). All information in the database is fictional.
The following figure shows the tables in the sample database and how they
relate to each other.
asademo.db
fin_code
dept_id = dept_id
contact code char(2) <pk>
type char(10) emp_id = dept_head_id
id integer <pk> description char(50)
last_name char(15)
first_name char(15)
title char(2) code = code
street char(30) department
city char(20)
state char(2) fin_data dept_id integer <pk>
zip char(5) dept_name char(40)
phone char(10) year char(4) <pk> dept_head_id integer <fk>
fax char(10) quarter char(2) <pk>
code char(2) <pk,fk>
amount numeric(9)
xv
Finding out more and providing feedback
We would like to receive your opinions, suggestions, and feedback on this
documentation.
You can provide feedback on this documentation and on the software
through newsgroups set up to discuss SQL Anywhere technologies. These
newsgroups can be found on the forums.sybase.com news server.
The newsgroups include the following:
♦ sybase.public.sqlanywhere.general
♦ sybase.public.sqlanywhere.linux
♦ sybase.public.sqlanywhere.mobilink
♦ sybase.public.sqlanywhere.product_futures_discussion
♦ sybase.public.sqlanywhere.replication
♦ sybase.public.sqlanywhere.ultralite
Newsgroup disclaimer
iAnywhere Solutions has no obligation to provide solutions, information
or ideas on its newsgroups, nor is iAnywhere Solutions obliged to provide
anything other than a systems operator to monitor the service and insure its
operation and availability.
iAnywhere Solutions Technical Advisors as well as other staff assist on the
newsgroup service when they have time available. They offer their help
on a volunteer basis and may not be available on a regular basis to provide
solutions and information. Their ability to help is based on their workload.
You can e-mail comments and suggestions to the SQL Anywhere
documentation team at [email protected]. Although we do not
undertake to reply to e-mails at that address, you can be sure we will read
your suggestions with interest.
xvi
PART I
This part describes key concepts and strategies for designing and building
databases. It covers issues of database design as well as the mechanics of
working with tables, views, and indexes. It also includes material on
referential integrity and transactions.
CHAPTER 1
About this chapter This chapter introduces the basic concepts of relational database design and
gives you step-by-step suggestions for designing your own databases. It uses
the expedient technique known as conceptual data modeling, which focuses
on entities and the relationships between them.
Contents Topic: page
Introduction 4
3
Introduction
While designing a database is not a difficult task for small and medium sized
databases, it is an important one. Bad database design can lead to an
inefficient and possibly unreliable database system. Because client
applications are built to work on specific parts of a database, and rely on the
database design, a bad design can be difficult to revise at a later date.
For more information, you may also wish to consult an introductory book
such as A Database Primer by C. J. Date. If you are interested in pursuing
database theory, C. J. Date’s An Introduction to Database Systems is an
excellent textbook on the subject.
4
Chapter 1. Designing Your Database
Entities
An entity is the database equivalent of a noun. Distinguishable objects such
as employees, order items, departments and products are all examples of
entities. In a database, a table represents each entity. The entities that you
build into your database arise from the activities for which you will be using
the database, such as tracking sales calls and maintaining employee
information.
Attributes Each entity contains a number of attributes. Attributes are particular
characteristics of the things that you would like to store. For example, in an
employee entity, you might want to store an employee ID number, first and
last names, an address, and other particular information that pertains to a
particular employee. Attributes are also known as properties.
You depict an entity using a rectangular box. Inside, you list the attributes
5
associated with that entity.
Employee
Employee Number
First Name
Last Name
Address
Relationships
A relationship between entities is the database equivalent of a verb. An
employee is a member of a department, or an office is located in a city.
Relationships in a database may appear as foreign key relationships between
tables, or may appear as separate tables themselves. You will see examples
of each in this chapter.
The relationships in the database are an encoding of rules or practices that
govern the data in the entities. If each department has one department head,
you can create a one-to-one relationship between departments and
employees to identify the department head.
Once a relationship is built into the structure of the database, there is no
provision for exceptions. There is nowhere to put a second department head.
Duplicating the department entry would involve duplicating the department
ID, which is the identifier. Duplicate identifiers are not allowed.
6
Chapter 1. Designing Your Database
Tip
Strict database structure can benefit you, because it can eliminate incon-
sistencies, such as a department with two managers. On the other hand,
you as the designer should make your design flexible enough to allow
some expansion for unforeseen uses. Extending a well-designed database
is usually not too difficult, but modifying the existing table structure can
render an entire database and its client applications obsolete.
Cardinality of There are three kinds of relationships between tables. These correspond to
relationships the cardinality (number) of the entities involved in the relationship.
♦ One-to-one relationships You depict a relationship by drawing a
line between two entities. The line may have other markings on it such as
the two little circles shown. Later sections explain the purpose of these
marks. In the following diagram, one employee manages one department.
Department Employee
Management relationship
Office Telephones
7
connections to both entities. This means that one warehouse can hold
many different parts, and one type of part can be stored at many
warehouses.
Parts Warehouses
storage relationship
Roles You can describe each relationship with two roles. Roles are verbs or
phrases that describe the relationship from each point of view. For example,
a relationship between employees and departments might be described by
the following two roles.
1. An employee is a member of a department.
2. A department contains an employee.
Employee
is a member of
Employee Number
First name Department
Last name Department ID
Address contains Department name
Roles are very important because they afford you a convenient and effective
means of verifying your work.
8
Chapter 1. Designing Your Database
Tip
Whether reading from left-to-right or from right-to-left, the following rule
makes it easy to read these diagrams:Read the 1 name of the first entity, 2
role next to the first entity, 3 cardinality from the connection to the second
entity, and 4 name of the second entity.
Mandatory elements The little circles just before the end of the line that denotes the relation serve
an important purpose. A circle means that an element can exist in the one
entity without a corresponding element in the other entity.
If a cross bar appears in place of the circle, that entity must contain at least
one element for each element in the other entity. An example will clarify
these statements.
Publisher
ID Number
Publisher Name
publishes Books
is written by
ID Number
is published by Author
Title
ID Number
writes First Name
Last Name
Tip
Think of the little circle as the digit 0 and the cross bar as the number one.
The circle means at least zero. The cross bar means at least one.
Reflexive relationships Sometimes, a relationship will exist between entries in a single entity. In this
case, the relationship is said to be reflexive. Both ends of the relationship
attach to a single entity.
9
Employee
Employee Number
First Name
Last Name
Address
manages reports to
Part Number
Description Warehouse
Warehouse ID
Address
But you wish to record the quantity of each part stored at each location. This
attribute can only be associated with the relationship. Each quantity depends
on both the parts and the warehouse involved. To represent this situation,
you can redraw the diagram as follows:
10
Chapter 1. Designing Your Database
Parts
Part Number stored at
Description
Inventory Warehouse
Warehouse ID
Quantity
contains Address
2. Each entry in the Inventory entity demands one mandatory entry in the
Parts entity and one mandatory entry in the Warehouse entity. These
relationships are mandatory because a storage relationship only makes
sense if it is associated with one particular part and one particular
warehouse.
3. The new entity is dependent on both the Parts entity and on the
Warehouse entity, meaning that the new entity is identified by the
identifiers of both of these entities. In this new diagram, one identifier
from the Parts entity and one identifier from the Warehouse entity
uniquely identify an entry in the Inventory entity. The triangles that
appear between the circles and the multiple lines that join the two new
relationships to the new Inventory entity denote the dependencies.
Do not add either a Part Number or Warehouse ID attribute to the Inventory
entity. Each entry in the Inventory entity does depend on both a particular
part and a particular warehouse, but the triangles denote this dependence
more clearly.
11
The design process
There are five major steps in the design process.
♦ “Step 1: Identify entities and relationships” on page 12.
♦ “Step 2: Identify the required data” on page 15.
♦ “Step 3: Normalize the data” on page 17.
♦ “Step 4: Resolve the relationships” on page 22.
♦ “Step 5: Verify the design” on page 25.
☞ For more information about implementing the database design, see
“Working with Database Objects” on page 29.
12
Chapter 1. Designing Your Database
♦ Terminate employees.
♦ Maintain personal employee information.
13
Skill is acquired by
is headed by Department
is capable of
manages
works out of
Employee
contains
contains
is a member of
Office
manages reports to
Break down the The following lower-level activities below are based on the high-level
high-level activities activities listed above:
♦ Add or delete an employee.
♦ Add or delete an office.
♦ List employees for a department.
♦ Add a skill to the skill list.
♦ Identify the skills of an employee.
♦ Identify an employee’s skill level for each skill.
♦ Identify all employees that have the same skill level for a particular skill.
♦ Change an employee’s skill level.
These lower-level activities can be used to identify if any new tables or
relationships are needed.
Identify business rules Business rules often identify one-to-many, one-to-one, and many-to-many
relationships.
The kind of business rules that may be relevant include the following:
♦ There are now five offices; expansion plans allow for a maximum of ten.
♦ Employees can change department or office.
♦ Each department has one department head.
♦ Each office has a maximum of three telephone numbers.
14
Chapter 1. Designing Your Database
Employee office
Employee address
If you make a diagram of this data, it will look something like this picture:
15
Employee Skill
is acquired by
Employee ID Skill ID
First name Skill name
Last name is capable of Skill description
Home address
Observe that not all of the attributes you listed appear in this diagram. The
missing items fall into two categories:
1. Some are contained implicitly in other relationships; for example,
Employee department and Employee office are denoted by the relations
to the Department and Office entities, respectively.
2. Others are not present because they are associated not with either of these
entities, but rather the relationship between them. The above diagram is
inadequate.
The first category of items will fall naturally into place when you draw the
entire entity-relationship diagram.
You can add the second category by converting this many-to-many
relationship into an entity.
Employee
Skill
Employee ID Expert in
First name is capable of Skill ID
Skill level is acquired by
Last name Skill name
Home address Date acquired Skill description
The new entity depends on both the Employee and the Skill entities. It
borrows its identifiers from these entities because it depends on both of them.
16
Chapter 1. Designing Your Database
Notes ♦ When you are identifying the supporting data, be sure to refer to the
activities you identified earlier to see how you will access the data.
For example, you may need to list employees by first name in some
situations and by last name in others. To accommodate this requirement,
create a First Name attribute and a Last Name attribute, rather than a
single attribute that contains both names. With the names separate, you
can later create two indexes, one suited to each task.
♦ Choose consistent names. Consistency makes it easier to maintain your
database and easier to read reports and output windows.
For example, if you choose to use an abbreviated name such as
Emp_status for one attribute, you should not use a full name, such as
Employee_ID, for another attribute. Instead, the names should be
Emp_status and Emp_ID.
♦ At this stage, it is not crucial that the data be associated with the correct
entity. You can use your intuition. In the next section, you’ll apply tests
to check your judgment.
Why normalize?
The goals of normalization are to remove redundancy and to improve
consistency. For example, if you store a customer’s address in multiple
locations, it is difficult to update all copies correctly when they move.
☞ For more information about the normalization tests, see a book on
database design.
Normal forms There are several tests for data normalization. When your data passes the
first test, it is considered to be in first normal form. When it passes the
second test, it is in second normal form, and when it passes the third test, it
is in third normal form.
17
♦ Identify keys for relationships. The keys for a relationship are the keys
from the two entities that it joins.
♦ Check for calculated data in your supporting data list. Calculated data
is not normally stored in a relational database.
18
Chapter 1. Designing Your Database
Department *Department ID
Department name
Heads *Department ID
*Employee ID
Member of *Department ID
*Employee ID
Skill *Skill ID
Skill name
Skill description
Expert in *Skill ID
*Employee ID
Skill level
Date acquired
Employee *Employee ID
last name
first name
Social security number
Address
phone number
date of birth
Putting data in first ♦ To test for first normal form, look for attributes that can have repeating
normal form values.
♦ Remove attributes when multiple values can apply to a single item. Move
these repeating attributes to a new entity.
In the entity below, Phone number can repeat—an office can have more than
one telephone number.
19
Department Employee
Management Relationship
Remove the repeating attribute and make a new entity called Telephone. Set
up a relationship between Office and Telephone.
Office has
Office code
Office address
Telephone
Phone number
is located at
Putting data in second ♦ Remove data that does not depend on the whole key.
normal form
♦ Look only at entities and relationships whose identifier is composed of
more than one attribute. To test for second normal form, remove any data
that does not depend on the whole identifier. Each attribute should
depend on all of the attributes that comprise the identifier.
20
Chapter 1. Designing Your Database
Employee ID
Department ID
Employee first name
Employee last name
Department name
Move the identifier Department ID, which the other employee data does not
depend on, to a entity of its own called Department. Also move any
attributes that depend on it. Create a relationship between Employee and
Department.
Employee
works in
Employee ID Department
Employee first name
Employee last name Department ID
contains Department name
Putting data in third ♦ Remove data that doesn’t depend directly on the key.
normal form ♦ To test for third normal form, remove any attributes that depend on other
attributes, rather than directly on the identifier.
In this example, the Employee and Office entity contains some attributes that
depend on its identifier, Employee ID. However, attributes such as Office
location and Office phone depend on another attribute, Office code. They do
not depend directly on the identifier, Employee ID.
Employee ID
Employee first name
Employee last name
Office code
Office location
Office phone
21
Remove Office code and those attributes that depend on it. Make another
entity called Office. Then, create a relationship that connects Employee with
Office.
Employee
works out of Office
Employee ID
Employee first name Office code
Employee last name Office location
houses Office phone
22
Chapter 1. Designing Your Database
Employee
is a member of
Employee Number
First name Department
Last name Department ID
Address contains Department name
Notice that entities become tables. Identifiers in entities become (at least
part of) the primary key in a table. Attributes become columns. In a
one-to-many relationship, the identifier in the one entity will appear as a
new foreign key column in the many table.
Employee
Employee ID <pk>
Department ID <fk>
First Name
Last Name
Address
Department ID = Department ID
Department
Department ID <pk>
Department Name
23
Vehicle
Truck
Vehicle ID <pk> Vehicle ID = Vehicle ID
Model Vehicle ID <fk>
Price Weight rating
Part Number
Description Warehouse
Warehouse ID
Address
The new Storage Location table relates the Parts and Warehouse tables.
Parts
Part Number <pk>
Description
Storage Location
Warehouse ID = Warehouse ID
Part Number = Part Number Part Number <pk,fk>
Warehouse ID <pk,fk>
Warehouse
Warehouse ID <pk>
Address
Resolving relationships Some of your relationships may carry data. This situation often occurs in
that carry data many-to-many relationships.
Parts
Part Number stored at
Description
Inventory Warehouse
Warehouse ID
Quantity
contains Address
If this is the case, each entity resolves to a table. Each role becomes a
foreign key that points to another table.
24
Chapter 1. Designing Your Database
Inventory
Warehouse ID <pk,fk>
Part Number <pk,fk>
Quantity
The Inventory entity borrows its identifiers from the Parts and Warehouse
tables, because it depends on both of them. Once resolved, these borrowed
identifiers form the primary key of the Inventory table.
Tip
A conceptual data model simplifies the design process because it hides a
lot of details. For example, a many-to-many relationship always generates
an extra table and two foreign key references. In a conceptual data model,
you can usually denote all of this structure with a single connection.
If you can answer yes to all the questions above, you are ready to implement
your design.
Final design Applying steps 1 through 3 to the database for the little company produces
the following entity-relationship diagram. This database is now in third
normal form.
25
Skill
is acquired by ID Number
Skill name
Skill description
Expert In
Skill Level
Date Acquired is headed by Department
Department ID
Department name
manages
is capable of Employee
Employee ID contains
works out of First name
Last name is a member of
houses
Home address
Skill
ID Number = ID Number ID Number <pk>
Skill name
Skill description
Department
Employee ID = Employee ID Department ID <pk>
Employee ID <fk>
Expert In Department name
ID Number <pk,fk>
Employee ID <pk,fk>
Skill Level
Date Acquired Department ID = Department ID
Department/Employee
Department ID <pk,fk>
Employee ID = Employee ID Employee ID <pk,fk>
Employee
Employee ID <pk>
ID Number = ID Number ID Number <fk>
Emp_Employee ID <fk>
First name Employee ID = Employee ID
Last name
Office Home address
ID Number <pk>
Office name
Address Employee ID = Emp_Employee ID
26
Chapter 1. Designing Your Database
The long binary data type can be used to store information such as images
(for instance, stored as bitmaps) or word-processing documents in a
database. These types of information are commonly called binary large
objects, or BLOBS.
☞ For more information about each data type, see “SQL Data Types” [ASA
SQL Reference, page 53].
NULL and NOT NULL If the column value is mandatory for a record, you define the column as
being NOT NULL. Otherwise, the column is allowed to contain the NULL
27
value, which represents no value. The default in SQL is to allow NULL
values, but you should explicitly declare columns NOT NULL unless there
is a good reason to allow NULL values.
☞ For more information about the NULL value, see “NULL value” [ASA
SQL Reference, page 49]. For information on its use in comparisons, see
“Search conditions” [ASA SQL Reference, page 23].
Choosing constraints
Although the data type of a column restricts the values that are allowed in
that column (for example, only numbers or only dates), you may want to
further restrict the allowed values.
You can restrict the values of any column by specifying a CHECK
constraint. You can use any valid condition that could appear in a WHERE
clause to restrict the allowed values. Most CHECK constraints use either the
BETWEEN or IN condition.
☞ For more information about valid conditions, see “Search conditions”
[ASA SQL Reference, page 23]. For more information about assigning
constraints to tables and columns, see “Ensuring Data Integrity” on page 79.
Example The sample database has a table called Department, which has columns
named dept_id, dept_name, and dept_head_id. Its definition is as follows:
If you specify NOT NULL, a column value must be supplied for every row
in the table.
28
CHAPTER 2
About this chapter This chapter describes the mechanics of creating, altering, and deleting
database objects such as tables, views, and indexes.
Contents Topic: page
Introduction 30
29
Introduction
With the Adaptive Server Anywhere tools, you can create a database file to
hold your data. Once this file is created, you can begin managing the
database. For example, you can add database objects, such as tables or users,
and you can set overall database properties.
This chapter describes how to create a database and the objects within it. It
includes procedures for Sybase Central, Interactive SQL, and command-line
utilities. If you want more conceptual information before you begin, see the
following chapters:
♦ “Designing Your Database” on page 3
30
Chapter 2. Working with Database Objects
Creating a database
Adaptive Server Anywhere provides a number of ways to create a database:
in Sybase Central, in Interactive SQL, and at the command line. Creating a
database is also called initializing it. Once the database is created, you can
connect to it and build the tables and other objects that you need in the
database.
Transaction log When you create a database, you must decide where to place the transaction
log. This log stores all changes made to a database, in the order in which
they are made. In the event of a media failure on a database file, the
transaction log is essential for database recovery. It also makes your work
more efficient. By default, it is placed in the same directory as the database
file, but this is not recommended for production use.
☞ For more information on placing the transaction log, see “Configuring
your database for data protection” [ASA Database Administration Guide,
page 361].
Database file An Adaptive Server Anywhere database is an operating system file. It can be
compatibility copied to other locations just as any other file is copied.
Database files are compatible among all operating systems, except where file
system file size limitations or Adaptive Server Anywhere support for large
files apply. A database created from any operating system can be used from
another operating system by copying the database file(s). Similarly, a
database created with a personal server can be used with a network server.
Adaptive Server Anywhere servers can manage databases created with
earlier versions of the software, but old servers cannot manage newer
databases.
☞ For more information about limitations, see “Size and number
limitations” [ASA Database Administration Guide, page 694].
31
Using other applications Some application design systems, such as Sybase PowerBuilder, contain
to create databases tools for creating database objects. These tools construct SQL statements
that are submitted to the server, typically through its ODBC interface. If you
are using one of these tools, you do not need to construct SQL statements to
create tables, assign permissions, and so on.
This chapter describes the SQL statements for defining database objects.
You can use these statements directly if you are building your database from
an interactive SQL tool, such as Interactive SQL. Even if you are using an
application design tool, you may want to use SQL statements to add features
to the database if they are not supported by the design tool.
For more advanced use, database design tools such as Sybase
PowerDesigner provide a more thorough and reliable approach to
developing well-designed databases.
☞ For more information about database design, see “Designing Your
Database” on page 3.
You can create a database in Sybase Central using the Create Database
utility. After you have created a database, it appears under its server in the
left pane of Sybase Central.
For more information, see “Creating databases (SQL)” on page 33, and
“Creating databases (command line)” on page 33.
32
Chapter 2. Working with Database Objects
Sybase Central has features to make database creation easy for Windows CE
databases. If you have Windows CE services installed on your Windows or
Windows NT desktop, you have the option to create a Windows CE database
when you create a database from Sybase Central. Sybase Central enforces
the requirements for Windows CE databases, and optionally copies the
resulting database file to your Windows CE machine.
Example Create a database file in the c:\temp directory with the database name
temp.db.
CREATE DATABASE ’c:\\temp\\temp.db’
The directory path is relative to the database server. You set the permissions
required to execute this statement on the server command line, using the -gu
command-line option. The default setting requires DBA authority.
The backslash is an escape character in SQL, and must be doubled. For
more information, see “Strings” [ASA SQL Reference, page 9].
You can create a database from a command line with the dbinit utility. With
this utility, you can include command-line parameters to specify different
options for the database.
33
dbinit -p 4096 company.db
Erasing databases
Erasing a database deletes all tables and data from disk, including the
transaction log that records alterations to the database. All database files are
read-only to prevent accidental modification or deletion of the database files.
In Sybase Central, you can erase a database using the Erase Database utility.
You need to connect to a database to access this utility, but the Erase
Database wizard lets you specify any database for erasing. In order to erase
a non-running database, the database server must be running.
In Interactive SQL, you can erase a database using the DROP DATABASE
statement. Required permissions can be set using the database server -gu
command-line option. The default setting is to require DBA authority.
You can also erase a database from a command line with the dberase utility.
The database to be erased must not be running when this utility is used.
34
Chapter 2. Working with Database Objects
35
☞ For more information, see “DISCONNECT statement [ESQL]
[Interactive SQL]” [ASA SQL Reference, page 431], and “DROP
CONNECTION statement” [ASA SQL Reference, page 435].
3. In the right pane, click the appropriate tabs to edit the desired properties.
You can also view and edit properties on the object’s property sheet. To
view the property sheet, right-click the object and choose Properties from
the popup menu.
2. Right-click the desired database and choose Options from the popup
menu.
3. Edit the desired values.
36
Chapter 2. Working with Database Objects
Tips
With the Database Options dialog, you can also set database options for
specific users and groups (when you open this dialog for a user or group, it
is called the User Options dialog or Group Options dialog respectively).
When you set options for the database itself, you are actually setting options
for the PUBLIC group in that database because all users and groups inherit
option settings from PUBLIC.
37
❖ To display system objects in a database (Sybase Central)
1. Open the desired server.
2. Right-click the desired connected database and choose Filter Objects by
Owner.
3. Select SYS and dbo, and then click OK.
The system tables, system views, system procedures, and system domains
appear in their respective folders. For example, system tables appear
alongside normal tables in the Tables folder.
☞ For more information, see “System Tables” [ASA SQL Reference,
page 649].
38
Chapter 2. Working with Database Objects
39
Working with tables
When the database is first created, the only tables in the database are the
system tables. System tables hold the database schema.
This section describes how to create, alter, and delete tables. You can
execute the examples in Interactive SQL, but the SQL statements are
independent of the administration tool you use. When you execute queries in
Interactive SQL, you can edit the values in the result set.
☞ For more information, see “Editing table values in Interactive SQL”
[Introducing SQL Anywhere Studio, page 170].
To make it easier for you to re-create the database schema when necessary,
create command files to define the tables in your database. The command
files should contain the CREATE TABLE and ALTER TABLE statements.
☞ For more information about groups, tables, and connecting as another
user, see “Referring to tables owned by groups” [ASA Database Administration
Guide, page 417] and “Database object names and prefixes” [ASA Database
Administration Guide, page 421].
Creating tables
When a database is first created, the only tables in the database are the
system tables, which hold the database schema. You can then create new
tables to hold your actual data, either with SQL statements in
Interactive SQL or with Sybase Central.
There are two types of tables that you can create:
♦ Base table A table that holds persistent data. The table and its data
continue to exist until you explicitly delete the data or drop the table. It is
called a base table to distinguish it from temporary tables and from views.
♦ Temporary table Data in a temporary table is held for a single
connection only. Global temporary table definitions (but not data) are
kept in the database until dropped. Local temporary table definitions and
data exist for the duration of a single connection only.
☞ For more information about temporary tables, see “Working with
temporary tables” on page 76.
Tables consist of rows and columns. Each column carries a particular kind of
information, such as a phone number or a name, while each row specifies a
particular entry.
40
Chapter 2. Working with Database Objects
Altering tables
This section describes how to change the structure or column definitions of a
table. For example, you can add columns, change various column attributes,
or delete columns entirely.
In Sybase Central, you can perform these tasks on the SQL tab in the right
pane of Sybase Central. In Interactive SQL, you can perform these tasks
with the ALTER TABLE statement.
41
If you are working with Sybase Central, you can also manage columns (add
or remove them from the primary key, change their properties, or delete
them) by working with menu commands when you have a column selected
in the Columns folder.
☞ For information on altering database object properties, see “Setting
properties for database objects” on page 36.
☞ For information on granting and revoking table permissions, see
“Granting permissions on tables” [ASA Database Administration Guide,
page 404] and “Revoking user permissions” [ASA Database Administration
Guide, page 411].
You can alter tables in Sybase Central on the Columns tab in the right pane.
For example, you can add or delete columns, change column definitions, or
change table or column properties.
4. Click the Columns tab in the right right pane and make the necessary
changes.
Tips
You can add columns by selecting a table’s Columns tab and choosing File
➤ New Column.
You can delete columns by selecting the column on the Columns tab and
choosing Edit ➤ Delete.
You can copy a column to a table by selecting the column on the Columns
tab in the right pane and then clicking the Copy button. Select the desired
table, click the Columns tab in the right pane, and then click the Paste
button.
It is also necessary to click the Save Table button or choose File ➤ Save
Table. Changes are not made to the table until then.
☞ For more information, see “ALTER TABLE statement” [ASA SQL
Reference, page 279].
42
Chapter 2. Working with Database Objects
You can alter tables in Interactive SQL using the ALTER TABLE statement.
Any current entries that are longer than 80 characters are trimmed to
conform to the 80-character limit, and a warning appears.
The following statement changes the name of the skill_type column to
classification:
ALTER TABLE skill
RENAME skill_type TO classification
These examples show how to change the structure of the database. The
ALTER TABLE statement can change just about anything pertaining to a
table—you can use it to add or delete foreign keys, change columns from
one type to another, and so on. In all these cases, once you make the change,
stored procedures, views and any other item referring to this table will no
longer work.
43
☞ For more information, see “ALTER TABLE statement” [ASA SQL
Reference, page 279], and “Ensuring Data Integrity” on page 79.
Deleting tables
This section describes how to delete tables from a database. You can use
either Sybase Central or Interactive SQL to perform this task. In
Interactive SQL deleting a table is also called dropping it.
You cannot delete a table that is being used as an article in a SQL Remote
publication. If you try to do this in Sybase Central, an error appears.
3. Right-click the table and choose Delete from the popup menu.
44
Chapter 2. Working with Database Objects
You can edit the data in the table from the Interactive SQL Results tab or
from Sybase Central.
45
Managing primary keys (SQL)
You can create and modify the primary key in Interactive SQL using the
CREATE TABLE and ALTER TABLE statements. These statements let you
set many table attributes, including column constraints and checks.
Columns in the primary key cannot contain NULL values. You must specify
NOT NULL on the column in the primary key.
The primary key values must be unique for each row in the table which, in
this case, means that you cannot have more than one row with a given
skill_id. Each row in a table is uniquely identified by its primary key.
Example 2 The following statement adds the columns skill_id and skill_type to the
primary key for the skill table:
ALTER TABLE skill
ADD PRIMARY KEY ( "skill_id", "skill_type" )
46
Chapter 2. Working with Database Objects
In Sybase Central, the foreign key of a table appears on the Foreign Keys tab
(located on the right pane when a table is selected).
You cannot create a foreign key in a table if the table contains values for the
foreign columns that can’t be matched to values in the primary table’s
primary key.
After you have created a foreign key, you can keep track of it on each table’s
Referencing Tables tab in the right pane; this folder displays any foreign
tables that reference the currently selected table.
3. Right-click the foreign key you want to delete and choose Delete from
the popup menu.
47
❖ To display which tables have foreign keys that reference a given
table (Sybase Central)
1. In the left pane, select the desired table.
2. In the right pane, click the Referencing Tables tab.
Tips
When you create a foreign key using the wizard, you can set properties
for the foreign key. To view properties after the foreign key is created,
right-click the foreign key on the Foreign Keys tab and choose Properties
from the popup menu.
You can view the properties of a referencing table by right-clicking the
table on the Referencing Tables tab and choosing Properties from the
popup menu.
You can create and modify the foreign key in Interactive SQL using the
CREATE TABLE and ALTER TABLE statements. These statements let you
set many table attributes, including column constraints and checks.
A table can only have one primary key defined, but it may have as many
foreign keys as necessary.
The emp_skill table definition has a primary key that consists of two
columns: the emp_id column and the skill_id column. An employee may
have more than one skill, and so appear in several rows, and several
employees may possess a given skill, so that the skill_id may appear several
48
Chapter 2. Working with Database Objects
times. However, there may be no more than one entry for a given employee
and skill combination.
The emp_skill table also has two foreign keys. The foreign key entries
indicate that the emp_id column must contain a valid employee number from
the employee table, and that the skill_id must contain a valid entry from the
skill table.
Example 2 You can add a foreign key called foreignkey to the existing table skill and
reference this foreign key to the primary key in the table contact, as follows:
ALTER TABLE skill
ADD FOREIGN KEY "foreignkey" ("skill_id")
REFERENCES "DBA"."contact" ("id")
This example creates a relationship between the skill_id column of the table
skill (the foreign table) and the id column of the table contact (the primary
table). The “DBA” signifies the owner of the table contact.
Example 3 You can specify properties for the foreign key as you create it. For example,
the following statement creates the same foreign key as in Example 2, but it
defines the foreign key as NOT NULL along with restrictions for when you
update or delete.
ALTER TABLE skill
ADD NOT NULL FOREIGN KEY "foreignkey" ("skill_id")
REFERENCES "DBA"."contact" ("id")
ON UPDATE RESTRICT
ON DELETE RESTRICT
In Sybase Central, you can also specify properties in the Foreign Key
Creation wizard or on the foreign key’s property sheet.
☞ For more information, see “ALTER TABLE statement” [ASA SQL
Reference, page 279], and “Managing foreign keys (Sybase Central)” on
page 47.
49
CREATE TABLE SHIPMENTS(
SHIPMENT_ID INTEGER NOT NULL PRIMARY KEY,
SHIPMENT_DATE TIMESTAMP,
PRODUCT_CODE CHAR(20) NOT NULL,
QUANTITY INTEGER NOT NULL,
TOTAL_PRICE DECIMAL(10,2) NOT NULL )
but the predicate in the WHERE clause is not sargable since it does not refer
to a single base column. If the size of the SHIPMENTS table is relatively
large, an indexed retrieval might be appropriate rather than a sequential scan.
You can do this by creating a computed column AVERAGE_COST for the
SHIPMENTS table, as follows:
ALTER TABLE shipments
ADD average_cost DECIMAL(30,22)
COMPUTE (total_price / quantity)
Choosing the type of the computed column is important; the Adaptive Server
Anywhere optimizer replaces only complex expressions by a computed
column if the data type of the expression in the query precisely matches the
data type of the computed column. To determine what the type of any
expression is, you can use the EXPRTYPE() built-in function that returns the
expression’s type in ready-to-use SQL terms:
SELECT EXPRTYPE( ’SELECT (TOTAL_PRICE / QUANTITY ) AS X FROM
SHIPMENTS’, 1 )
FROM DUMMY
and the predicate in the WHERE clause is now a sargable one, making it
50
Chapter 2. Working with Database Objects
possible for the optimizer to choose an indexed scan, using the new
idx_avg_cost index, for the query’s access plan. Values of computed
columns are automatically maintained by the database server as rows are
inserted and updated. Most applications should never need to update or
insert computed column values directly; however, since computed columns
are base columns like any other, they can be directly referenced in predicates
and in expressions when it makes sense to do so.
Although you can use INSERT, UPDATE, or LOAD TABLE statements to
insert values in computed columns, this is neither the recommended nor
intended application of this feature. The LOAD TABLE statement permits
the optional computation of computed columns, which can aid the DBA
during complex unload/reload sequences, or when it is vital that the value of
a computed column stay constant when the COMPUTE expression refers to
non-deterministic values, such as CURRENT TIMESTAMP.
Creating tables with The following CREATE TABLE statement is used to create the product table
computed columns in the Java sample tables:
CREATE TABLE product
(
id INTEGER NOT NULL,
JProd asademo.Product NOT NULL,
name CHAR(15) COMPUTE ( JProd>>name ),
PRIMARY KEY ("id")
)
Adding computed The following statement adds a computed column named inventory_value to
columns to tables the product table:
ALTER TABLE product
ADD inventory_value INTEGER
COMPUTE ( JProd.quantity * JProd.unit_price )
Modifying computed You can change the expression used in a computed column with the ALTER
column expressions TABLE statement. The following statement changes the expression that a
computed column is based on.
ALTER TABLE table_name
ALTER column-name SET COMPUTE ( expression )
Existing values in the column are not changed when this statement is
51
executed, but they are no longer updated automatically.
52
Chapter 2. Working with Database Objects
53
Working with views
Views are computed tables. You can use views to show database users
exactly the information you want to present, in a format you can control.
Similarities between Views are similar to the permanent tables of the database (a permanent table
views and base tables is also called a base table) in many ways:
♦ You can assign access permissions to views just as to base tables.
♦ You can perform SELECT queries on views.
♦ You can perform UPDATE, INSERT, and DELETE operations on some
views.
♦ You can create views based on other views.
Differences between There are some differences between views and permanent tables:
views and permanent
♦ You cannot create indexes on views.
tables
♦ You cannot perform UPDATE, INSERT, and DELETE operations on all
views.
♦ You cannot assign integrity constraints and keys to views.
♦ Views refer to the information in base tables, but do not hold copies of
that information. Views are recomputed each time you invoke them.
Benefits of tailoring Views let you tailor access to data in the database. Tailoring access serves
access several purposes:
♦ Improved security By allowing access to only the information that is
relevant.
♦ Improved usability By presenting users and application developers
with data in a more easily understood form than in the base tables.
♦ Improved consistency By centralizing in the database the definition of
common queries.
Creating views
When you browse data, a SELECT statement operates on one or more tables
and produces a result set that is also a table. Just like a base table, a result set
from a SELECT query has columns and rows. A view gives a name to a
particular query, and holds the definition in the database system tables.
Suppose you frequently need to list the number of employees in each
department. You can get this list with the following statement:
54
Chapter 2. Working with Database Objects
You can create a view containing the results of this statement using either
Sybase Central or Interactive SQL.
Example Create a view called DepartmentSize that contains the results of the
SELECT statement given at the beginning of this section:
CREATE VIEW DepartmentSize AS
SELECT dept_ID, count(*)
FROM employee
GROUP BY dept_ID
55
☞ For more information, see “CREATE VIEW statement” [ASA SQL
Reference, page 406].
Using views
Restrictions on SELECT There are some restrictions on the SELECT statements you can use as views.
statements In particular, you cannot use an ORDER BY clause in the SELECT query. A
characteristic of relational tables is that there is no significance to the
ordering of the rows or columns, and using an ORDER BY clause would
impose an order on the rows of the view. You can use the GROUP BY
clause, subqueries, and joins in view definitions.
To develop a view, tune the SELECT query by itself until it provides exactly
the results you need in the format you want. Once you have the SELECT
query just right, you can add a phrase in front of the query to create the view.
For example,
CREATE VIEW viewname AS
Updating views UPDATE, INSERT, and DELETE statements are allowed on some views,
but not on others, depending on its associated SELECT statement.
You cannot update views containing aggregate functions, such as
COUNT(*). Nor can you update views containing a GROUP BY clause in
the SELECT statement, or views containing a UNION operation. In all these
cases, there is no way to translate the UPDATE into an action on the
underlying tables.
Copying views In Sybase Central, you can copy views between databases. To do so, select
the view in the right pane of Sybase Central and drag it to the Views folder
of another connected database. A new view is then created, and the original
view’s code is copied to it.
Note that only the view code is copied to the new view. The other view
properties, such as permissions, are not copied.
56
Chapter 2. Working with Database Objects
57
4. List all employees in the sales department Inspect the view.
SELECT *
FROM sales_employee
When you create a view using the WITH CHECK OPTION, any UPDATE
or INSERT statement on the view is checked to ensure that the new row
matches the view condition. If it does not, the operation causes an error and
is rejected.
The following modified sales_employee view rejects the update statement,
generating the following error message:
Invalid value for column ’dept_id’ in table ’employee’
The check option is If a view (say V2) is defined on the sales_employee view, any updates or
inherited inserts on V2 that cause the WITH CHECK OPTION criterion on
sales_employee to fail are rejected, even if V2 is defined without a check
option.
Modifying views
You can modify a view using both Sybase Central and Interactive SQL.
When doing so, you cannot rename an existing view directly. Instead, you
must create a new view with the new name, copy the previous code to it, and
then delete the old view.
In Sybase Central, you can edit the code of views, procedures, and functions
on the object’s SQL tab in the right pane. You edit a view in a separate
window by right-clicking the view and choosing Edit In New Window from
58
Chapter 2. Working with Database Objects
the popup menu. In Interactive SQL, you can use the ALTER VIEW
statement to modify a view. The ALTER VIEW statement replaces a view
definition with a new definition, but it maintains the permissions on the view.
☞ For more information on altering database object properties, see
“Setting properties for database objects” on page 36.
☞ For more information on setting permissions, see “Granting permissions
on tables” [ASA Database Administration Guide, page 404] and “Granting
permissions on views” [ASA Database Administration Guide, page 406]. For
information about revoking permissions, see “Revoking user permissions”
[ASA Database Administration Guide, page 411].
Tip
If you wish to edit multiple views, you may wish to open separate
windows for each view rather than editing each view on the SQL tab
in the right pane. You can open a separate window by right-clicking a
view and choosing Edit In New Window from the popup menu.
Deleting views
You can delete a view in both Sybase Central and Interactive SQL.
59
❖ To delete a view (Sybase Central)
1. Open the Views folder.
2. Right-click the desired view and choose Delete from the popup menu.
60
Chapter 2. Working with Database Objects
61
Working with indexes
Performance is an important consideration when designing and creating your
database. Indexes can dramatically improve the performance of statements
that search for a specific row or a specific subset of the rows. On the other
hand, indexes take up additional disk space and may slow inserts, updates,
and deletes.
62
Chapter 2. Working with Database Objects
63
into memory by half.
The clustering of indexes in Adaptive Server Anywhere is approximate.
While the server attempts to preserve the key order, total clustering is not
guaranteed. As well, the clustering degrades over time, as more and more
rows are inserted into your database.
You can implement one clustered index per table, using the following
statements:
♦ The CREATE TABLE statement
♦ The ALTER TABLE statement
♦ The CREATE INDEX statement
♦ The DECLARE LOCAL TEMPORARY TABLE statement
Several statements work in conjunction with each other to allow you to
maintain and restore the clustering effect:
♦ The UNLOAD TABLE statement allows you to unload a table in the
order of the index key.
♦ The LOAD TABLE statement inserts rows into the table in the order of
the index key.
♦ The INSERT statement attempts to put new rows on the same table page
as the one containing adjacent rows as per the primary key order.
♦ The REORGANIZE table statement can restore the clustering by
rearranging the rows according to the clustering index. On tables where
clustering is not specified, tables are ordered using the primary key.
The Optimizer assumes that the table rows are stored in key order and costs
index scans accordingly.
Creating indexes
Indexes are created on one or more columns of a specified table. You can
create indexes on base tables or temporary tables, but you cannot create an
index on a view. To create an individual index, you can use either Sybase
Central or Interactive SQL. You can use the Index Consultant to guide you in
a proper selection of indexes for your database.
64
Chapter 2. Working with Database Objects
Validating indexes
You can validate an index to ensure that every row referenced in the index
actually exists in the table. For foreign key indexes, a validation check also
ensures that the corresponding row exists in the primary table, and that their
hash values match. This check complements the validity checking carried
out by the VALIDATE TABLE statement.
65
❖ To validate an index (Sybase Central)
1. Connect to a database with DBA authority or as the owner of the table on
which the index is created.
2. In the left pane, open the Indexes folder.
3. Right-click the desired index and choose Validate from the popup menu.
Dropping indexes
If an index is no longer required, you can remove it from the database in
Sybase Central or in Interactive SQL.
66
Chapter 2. Working with Database Objects
2. Capturing a workload.
A workload is a set of queries or other data manipulation statements over
which the Index Consultant tries to optimize performance. Depending on
your goals, you may wish to make your workload a representative set of
operations, or you may wish to identify a set of key bottleneck
operations. The Index Consultant can capture one or more workloads,
and can store them for later analysis.
For more information, see “Understanding workloads” on page 70.
67
3. Analyzing the workload.
The Index Consultant analyzes a workload or single query by generating
candidate indexes and exploring their effect on performance. To explore
the effect of different candidate indexes, the Index Consultant repeatedly
re-optimizes the queries in the workload under different sets of indexes.
It does not execute the queries.
The Index Consultant can also store the results of multiple analyses on
any workload, using different settings.
When analyzing a workload, the Index Consultant presents you with a set
of options:
♦ Recommend clustered indexes If this option is selected, the Index
Consultant analyzes the effect of clustered indexes as well as
unclustered indexes.
Properly selected clustered indexes can provide significant
performance improvements over unclustered indexes for some
workloads, but you must reorganize the table (using the
REORGANIZE TABLE statement) for them to be effective. In
addition, the analysis takes longer if the effects of clustered indexes are
considered.
☞ For more information about clustered indexes, see “Using
clustered indexes” on page 63.
♦ Keep existing secondary indexes The Index Consultant can carry
out its analysis by either maintaining the existing set of secondary
indexes in the database, or by ignoring the existing secondary indexes.
A secondary index is an index that is not a unique constraint or a
primary or foreign key. Indexes that are present to enforce referential
integrity constraints are always considered when selecting access
plans.
The analysis includes the following steps:
♦ Generate candidate indexes For each workload, the Index
Consultant generates a set of candidate indexes. Creating a real index
on a large table can be a time consuming operation, so the Index
Consultant creates its candidates as virtual indexes. A virtual index
cannot be used to actually execute queries, but the query optimizer can
use virtual indexes to estimate the cost of execution plans as if such an
index were available. Virtual indexes allow the Index Consultant to
carry out “what-if” analysis without the expense of creating and
managing real indexes. Virtual indexes have a limit of four columns.
♦ Testing the benefits and costs of candidate indexes The Index
Consultant asks the Adaptive Server Anywhere query optimizer to
estimate the cost of executing the queries in the workload, with and
without different combinations of candidate indexes.
68
Chapter 2. Working with Database Objects
♦ Sybase Central You can use the Index Consultant from Sybase Central
to analyze the benefits of indexes for a workload, or set of database
requests.
69
Stopping the Index Consultant
The Index Consultant does have an impact on database performance while it
is capturing a workload for analysis. For this reason, the database server
window displays an informational message while the Index Consultant is
capturing a workload.
Usually, the Index Consultant is stopped from the Sybase Central user
interface. There may be occasions when it is necessary to stop the Index
Consultant from another machine: for example, if the Index Consultant is
inadvertantly left running.
To pause or stop the capturing of a workload, call the
sa_pause_workload_capture procedure or the sa_stop_workload_capture,
respectively:
call sa_pause_workload_capture;
call sa_stop_workload_capture;
Understanding workloads
70
Chapter 2. Working with Database Objects
Caution
Do not change the database schema during the workload capture step. Do
not change the database schema between the capture step and the analysis
step. Such changes invalidate the Index Consultant recommendations.
The analysis step is carried out by repeatedly carrying out the following set
of operations:
1. Create a candidate set of virtual indexes.
A virtual index contains no actual data and cannot be used for actual
query execution. It can be used by the optimizer when selecting an
71
appropriate execution plan for a query. The Index Consultant generates
many alternative sets of virtual indexes.
2. Optimize the workload operations for this candidate set of virtual indexes.
The Index Consultant retrieves the plan for each statement in the
workload, as chosen by the Adaptive Server Anywhere query optimizer.
The optimizer considers applicable virtual indexes from the candidate set
for each statement. However, the statements are not executed. The Index
Consultant does not modify user data.
For each query, the query optimizer compares the execution cost of many
alternative execution plans. It estimates the execution cost based on an
internal cost model. One important choice that influences the cost of each
execution plan is which tables to access using an index, and which to
access without using an index. Each set of virtual indexes opens up new
execution plan alternatives and closes others.
The cost model depends on the state of the database server. Specifically,
although the Index Consultant itself does not read data from disk or
execute operations against the database, the cost model depends on which
data is in cache and which must be accessed from disk. Therefore,
running the Index Consultant may not generate the same
recommendations each time you run an analysis on a particular workload.
For example, running the Index Consultant on a database server that has
just started up (and so has no data in cache) may provide different
recommendations to running it on a database server that has been in
operation for some time.
☞ For more information about the cost model, see “Optimizer
estimates” on page 397.
The Index Consultant provides a set of tabs with the results of a given
analysis. The results of an analysis can be saved for later review.
Summary tab The Summary tab provides an overview of the workload and the analysis,
including such information as the number of queries in the workload, the
number of recommended indexes, the number of pages required for the
recommended indexes, and the benefit that the recommended indexes are
expected to yield. The benefit number is measured in internal units of cost.
Recommended Indexes The Recommended Indexes tab contains data about each of the
tab recommended indexes. Among the information provided is the following:
♦ Clustered Each table can have at most one clustered index. In some
cases, a clustered index can provide significantly more benefit than an
unclustered index.
72
Chapter 2. Working with Database Objects
73
Log tab The Log tab lists activities that have been completed for this analysis.
74
Chapter 2. Working with Database Objects
75
Working with temporary tables
Temporary tables, whether local or global, serve the same purpose:
temporary storage of data. The difference between the two, and the
advantages of each, however, lies in the duration each table exists.
A local temporary table exists only for the duration of a connection or, if
defined inside a compound statement, for the duration of the compound
statement.
☞ For more information, see “DECLARE LOCAL TEMPORARY TABLE
statement” [ASA SQL Reference, page 421].
The definition of the global temporary table remains in the database
permanently, but the rows exist only within a given connection. When you
close the database connection, the data in the global temporary table
disappears. However, the table definition remains with the database for you
to access when you open your database next time.
Temporary tables are stored in the temporary file. Like any other dbspace,
pages from the temporary file can be cached. Operations on temporary
tables are never written to the transaction log.
☞ For more information, see “CREATE TABLE statement” [ASA SQL
Reference, page 385].
76
Chapter 2. Working with Database Objects
♦ Indexes
♦ Message types
♦ Triggers
77
♦ UltraLite projects
♦ UltraLite statements
♦ Unique constraints
♦ Users
♦ Views
♦ Web services
78
CHAPTER 3
About this chapter Building integrity constraints right into the database is the surest way to
make sure your data stays in good shape. This chapter describes the facilities
in Adaptive Server Anywhere for ensuring that the data in your database is
valid and reliable.
You can enforce several types of integrity constraints. For example, you can
ensure individual entries are correct by imposing constraints and CHECK
constraints on tables and columns. Setting column properties by choosing an
appropriate data type or setting special default values assists this task.
The SQL statements in this chapter use the CREATE TABLE and ALTER
TABLE statements, basic forms of which were introduced in “Working with
Database Objects” on page 29.
Contents Topic: page
Using domains 93
79
Data integrity overview
If data has integrity, the data is valid—correct and accurate—and the
relational structure of the database is intact. Referential integrity constraints
enforce the relational structure of the database. These rules maintain the
consistency of data between tables.
Adaptive Server Anywhere supports stored procedures, which give you
detailed control over how data enters the database. You can also create
triggers, or customized stored procedures invoked automatically when a
certain action, such as an update of a particular column, occurs.
☞ For more information on procedures and triggers see “Using Procedures,
Triggers, and Batches” on page 645.
80
Chapter 3. Ensuring Data Integrity
81
As well, column constraints can be inherited from domains. For more
information on these and other table and column constraints, see “Using
table and column constraints” on page 89.
Entity and referential Relationships, defined by the primary keys and foreign keys, tie together the
integrity information in relational database tables. You must build these relations
directly into the database design. The following integrity rules maintain the
structure of the database:
♦ Entity integrity Keeps track of the primary keys. It guarantees that
every row of a given table can be uniquely identified by a primary key
that guarantees IS NOT NULL.
♦ Referential integrity Keeps track of the foreign keys that define the
relationships between tables. It guarantees that all foreign key values
either match a value in the corresponding primary key or contain the
NULL value if they are defined to allow NULL.
☞ For more information about enforcing referential integrity, see
“Enforcing entity and referential integrity” on page 96. For more
information about designing appropriate primary and foreign key relations,
see “Designing Your Database” on page 3.
Triggers for advanced You can also use triggers to maintain data integrity. A trigger is a procedure
integrity rules stored in the database and executed automatically whenever the information
in a specified table changes. Triggers are a powerful mechanism for database
administrators and developers to ensure that data remains reliable.
☞ For more information about triggers, see “Using Procedures, Triggers,
and Batches” on page 645.
82
Chapter 3. Ensuring Data Integrity
83
☞ Each of the other default values is specified in a similar manner. For
more information, see “ALTER TABLE statement” [ASA SQL Reference,
page 279] and “CREATE TABLE statement” [ASA SQL Reference, page 385].
84
Chapter 3. Ensuring Data Integrity
Current timestamp The current timestamp is similar to the current date default, but offers
greater accuracy. For example, a user of a contact management application
may have several contacts with a single customer in one day: the current
timestamp default would be useful to distinguish these contacts.
Since it records a date and the time down to a precision of millionths of a
second, you may also find the current timestamp useful when the sequence
of events is important in a database.
For more information about timestamps, times, and dates, see “SQL Data
Types” [ASA SQL Reference, page 53].
85
greater than 231 – 1 or large double or numeric values may cause
wraparound to negative values.
☞ You can retrieve the most recent value inserted into an autoincrement
column using the @@identity global variable. For more information, see
“@@identity global variable” [ASA SQL Reference, page 46].
Autoincrement and Autoincrement is intended to work with positive integers.
negative numbers
The initial autoincrement value is set to 0 when the table is created. This
value remains as the highest value assigned when inserts are done that
explicitly insert negative values into the column. An insert where no value is
supplied causes the AUTOINCREMENT to generate a value of 1, forcing
any other generated values to be positive.
In UltraLite applications, the autoincrement value is not set to 0 when the
table is created, and AUTOINCREMENT generates negative numbers when
a signed data type is used for the column.
You should define AUTOINCREMENT columns as unsigned to prevent
negative values from being used.
Autoincrement and the ☞ A column with the AUTOINCREMENT default is referred to in
IDENTITY column Transact-SQL applications as an IDENTITY column. For information on
IDENTITY columns, see “The special IDENTITY column” on page 488.
86
Chapter 3. Ensuring Data Integrity
87
contain entries such as the date fifteen days from today, which would be
entered as
... DEFAULT ( dateadd( day, 15, getdate() ) )
88
Chapter 3. Ensuring Data Integrity
Caution
Altering tables can interfere with other users of the database. Although
you can execute the ALTER TABLE statement while other connections
are active, you cannot execute the ALTER TABLE statement if any other
connection is using the table you want to alter. For large tables, ALTER
TABLE is a time-consuming operation, and all other requests referencing
the table being altered are prohibited while the statement is processing.
This section describes how to use constraints to help ensure the accuracy of
data in the table.
Example 2 ♦ You can ensure that the entry matches one of a limited number of values.
For example, to ensure that a city column only contains one of a certain
number of allowed cities (say, those cities where the organization has
offices), you could use a constraint such as:
ALTER TABLE office
MODIFY city
CHECK ( city IN ( ’city_1’, ’city_2’, ’city_3’ ) )
89
♦ By default, string comparisons are case insensitive unless the database is
explicitly created as a case-sensitive database.
Example 3 ♦ You can ensure that a date or number falls in a particular range. For
example, you may require that the start_date column of an employee
table must be between the date the organization was formed and the
current date using the following constraint:
ALTER TABLE employee
MODIFY start_date
CHECK ( start_date BETWEEN ’1983/06/27’
AND CURRENT DATE )
♦ You can use several date formats. The YYYY/MM/DD format in this
example has the virtue of always being recognized regardless of the
current option settings.
Column CHECK tests only fail if the condition returns a value of FALSE. If
the condition returns a value of UNKNOWN, the change is allowed.
90
Chapter 3. Ensuring Data Integrity
variable prefixed with the @ sign is replaced by the name of the column
when the CHECK constraint is evaluated, any variable name prefixed with
@ could be used instead of @col.
CREATE DATATYPE posint INT
CHECK ( @col > 0 )
An ALTER TABLE statement with the DELETE CHECK clause deletes all
CHECK constraints from the table definition, including those inherited from
domains.
Any changes made to constraint in a domain definition after a column is
defined on that domain are not applied to the column. The column gets the
constraints from the domain when it is created, but there is no further
between the two.
☞ For more information about domains, see “Domains” [ASA SQL
Reference, page 76].
❖ To manage constraints
1. Open the Tables folder.
2. In the right pane, double-click the table you want to alter.
3. The right pane has separate tabs for unique constraints and check
constraints.
4. Make the appropriate changes to the constraint you wish to modify. For
example, to add a table or column constraint, click the Check Constraints
tab and choose File ➤ New ➤ Table Check Constraint or File ➤ New ➤
Column Check Constraint.
91
ALTER TABLE customer
MODIFY phone CHECK NULL
Sybase Central lets you add, modify and delete both table and column
CHECK constraints. For more information, see “Working with table and
column constraints in Sybase Central” on page 91.
Deleting a column from a table does not delete CHECK constraints
associated with the column held in the table constraint. Not removing the
constraints produces a column not found error message upon any
attempt to insert, or even just query, data in the table.
Table CHECK constraints fail only if a value of FALSE is returned. A value
of UNKNOWN allows the change.
92
Chapter 3. Ensuring Data Integrity
Using domains
A domain is a user-defined data type that, together with other attributes, can
restrict the range of acceptable values or provide defaults. A domain extends
one of the built-in data types. The range of permissible values is usually
restricted by a check constraint. In addition, a domain can specify a default
value and may or may not allow nulls.
You can define your own domains for a number of reasons.
♦ A number of common errors can be prevented if inappropriate values
cannot be entered. A constraint placed on a domain ensures that all
columns and variables intended to hold values in a desired range or
format can hold only the intended values. For example, a data type can
ensure that all credit card numbers typed into the database contain the
correct number of digits.
93
❖ To assign domains to columns (Sybase Central)
1. For the desired table, click the Columns tab in the right pane.
2. In the data type column for the desired column, either:
♦ Select the domain from the dropdown list, or
♦ Click the button next to the dropdown list and choose the domain on
the property sheet.
Having defined these domains, you can use them much as you would the
built-in data types. For example, you can use these definitions to define a
tables as follows.
CREATE TABLE customer (
id INT DEFAULT AUTOINCREMENT PRIMARY KEY
name persons_name
address street_address
)
Example 2: Default In the above example, the table’s primary key is specified to be of type
values, check integer. Indeed, many of your tables may require similar identifiers. Instead
constraints, and of specifying that these are integers, it is much more convenient to create an
identifiers identifier domain for use in these applications.
When you create a domain, you can specify a default value and provide
check constraint to ensure that no inappropriate values are typed into any
column of this type.
Integer values are commonly used as table identifiers. A good choice for
unique identifiers is to use positive integers. Since such identifiers are likely
to be used in many tables, you could define the following domain.
CREATE DOMAIN identifier INT
DEFAULT AUTOINCREMENT
CHECK ( @col > 0 )
94
Chapter 3. Ensuring Data Integrity
This check constraint uses the variable @col. Using this definition, you can
rewrite the definition of the customer table, shown above.
CREATE TABLE customer (
id identifier PRIMARY KEY
name persons_name
address street_address
)
Example 3: Built-in Adaptive Server Anywhere comes with some domains pre-defined. You can
domains use these pre-defined domains as you would a domain that you created
yourself. For example, the following monetary domain has already been
created for you.
CREATE DOMAIN MONEY NUMERIC(19,4)
NULL
Deleting domains
You can use either Sybase Central or a DROP DOMAIN statement to delete
a domain.
Only the user DBA or the user who created a domain can drop it. In
addition, since a domain cannot be dropped if any variable or column in the
database is an instance of the domain, you need to first drop any columns or
variables of that type before you can drop the domain.
2. Right-click the desired domain and choose Delete from the popup menu.
95
Enforcing entity and referential integrity
The relational structure of the database enables the personal server to
identify information within the database, and ensures that all the rows in
each table uphold the relationships between tables (described in the database
structure).
96
Chapter 3. Ensuring Data Integrity
The table owner defines the primary key for a table when they create it. If
they modify the structure of a table at a later date, they can also redefine the
primary key.
Some application development systems and database design tools allow you
to create and alter database tables. If you are using such a system, you may
not have to enter the CREATE TABLE or ALTER TABLE statement
explicitly: the application may generate the statement itself from the
information you provide.
☞ For more information about creating primary keys, see “Managing
primary keys” on page 45. For the detailed syntax of the CREATE TABLE
statement, see “CREATE TABLE statement” [ASA SQL Reference, page 385].
For information about changing table structure, see the “ALTER TABLE
statement” [ASA SQL Reference, page 279].
97
can allow NULL values, and is optional.
98
Chapter 3. Ensuring Data Integrity
♦ SET NULL Sets all foreign keys that reference the modified primary key
to NULL.
♦ SET DEFAULT Sets all foreign keys that reference the modified primary
key to the default value for that column (as specified in the table
definition).
99
following statement gives the error primary key for row in table
’department’ is referenced in another table:
DELETE FROM department
WHERE dept_id = 200
100
Chapter 3. Ensuring Data Integrity
101
CHAPTER 4
About this chapter You can group SQL statements into transactions, which have the property
that either all statements are executed or none is executed. You should
design each transaction to perform a task that changes your database from
one consistent state to another.
This chapter describes transactions and how to use them in applications. It
also describes how Adaptive Server Anywhere you can set isolation levels to
limit the interference among concurrent transaction.
Contents Topic: page
Summary 155
103
Introduction to transactions
To ensure data integrity, it is essential that you can identify states in which
the information in your database is consistent. The concept of consistency is
best illustrated through an example:
Consistency example Suppose you use your database to handle financial accounts, and you wish to
transfer money from one client’s account to another. The database is in a
consistent state both before and after the money is transferred; but it is not in
a consistent state after you have debited money from one account and before
you have credited it to the second. During a transferal of money, the
database is in a consistent state when the total amount of money in the
clients’ accounts is as it was before any money was transferred. When the
money has been half transferred, the database is in an inconsistent state.
Either both or neither of the debit and the credit must be processed.
Transactions are logical A transaction is a logical unit of work. Each transaction is a sequence of
units of work logically related commands that accomplish one task and transform the
database from one consistent state into another. The nature of a consistent
state depends on your database.
The statements within a transaction are treated as an indivisible unit: either
all are executed or none is executed. At the end of each transaction, you
commit your changes to make them permanent. If for any reason some of
the commands in the transaction do not process properly, then any
intermediate changes are undone, or rolled back. Another way of saying
this is that transactions are atomic.
Grouping statements into transactions is key both to protecting the
consistency of your data (even in the event of media or system failure), and
to managing concurrent database operations. Transactions may be safely
interleaved and the completion of each transaction marks a point at which
the information in the database is consistent.
In the event of a system failure or database crash during normal operation,
Adaptive Server Anywhere performs automatic recovery of your data when
the database is next started. The automatic recovery process recovers all
completed transactions, and rolls back any transactions that were
uncommitted when the failure occurred. The atomic character of
transactions ensures that databases are recovered to a consistent state.
☞ For more information about database backups and data recovery, see
“Backup and Data Recovery” [ASA Database Administration Guide, page 343].
For more information about concurrent database usage, see “Introduction to
concurrency” on page 106.
104
Chapter 4. Using Transactions and Isolation Levels
Using transactions
Adaptive Server Anywhere expects you to group your commands into
transactions. Knowing which commands or actions signify the start or end of
a transaction lets you take full advantage of this feature.
Starting transactions Transactions start with one of the following events:
♦ The first statement following a connection to a database
♦ The first statement following the end of a transaction
Completing transactions Transactions complete with one of the following events:
105
♦ The setting of the option COMMIT_ON_EXIT controls what happens to
uncommitted changes when you exit Interactive SQL. If this option is set
to ON (the default), Interactive SQL does a COMMIT; otherwise it
undoes your uncommitted changes with a ROLLBACK statement.
☞ Adaptive Server Anywhere also supports Transact-SQL commands such
as BEGIN TRANSACTION, for compatibility with Sybase Adaptive Server
Enterprise. For further information, see “Transact-SQL Compatibility” on
page 471.
Introduction to concurrency
Concurrency is the ability of the database server to process multiple
transactions at the same time. Were it not for special mechanisms within the
database server, concurrent transactions could interfere with each other to
produce inconsistent and incorrect information.
Example A database system in a department store must allow many clerks to update
customer accounts concurrently. Each clerk must be able to update the status
of the accounts as they assist each customer: they cannot afford to wait until
no one else is using the database.
Who needs to know Concurrency is a concern to all database administrators and developers.
about concurrency Even if you are working with a single-user database, you must be concerned
with concurrency if you want to process requests from multiple applications
or even from multiple connections from a single application. These
applications and connections can interfere with each other in exactly the
same way as multiple users in a network setting.
Transaction size affects The way you group SQL statements into transactions can have significant
concurrency effects on data integrity and on system performance. If you make a
transaction too short and it does not contain an entire logical unit of work,
then inconsistencies can be introduced into the database. If you write a
transaction that is too long and contains several unrelated actions, then there
is greater chance that a ROLLBACK will unnecessarily undo work that
could have been committed quite safely into the database.
If your transactions are long, they can lower concurrency by preventing
other transactions from being processed concurrently.
There are many factors that determine the appropriate length of a
transaction, depending on the type of application and the environment.
106
Chapter 4. Using Transactions and Isolation Levels
107
Isolation levels and consistency
There are four isolation Adaptive Server Anywhere allows you to control the degree to which the
levels operations in one transaction are visible to the operations in other concurrent
transactions. You do so by setting a database option called the isolation
level. Adaptive Server Anywhere has four different isolation levels
(numbered 0 through 3) that prevent some or all interference. Level 3
provides the highest level of isolation. Lower levels allow more
inconsistencies, but typically have better performance. Level 0 is the default
setting.
All isolation levels guarantee that each transaction will execute completely
or not at all, and that no updates will be lost.
108
Chapter 4. Using Transactions and Isolation Levels
Other types of inconsistencies can also exist. These three were chosen for
the ISO SQL/92 standard because they are typical problems and because it
was convenient to describe amounts of locking between transactions in
terms of them.
Isolation levels and dirty The isolation levels are different with respect to the type of inconsistent
reads, non-repeatable behavior that Adaptive Server Anywhere allows. An x means that the
reads, and phantom rows behavior is prevented, and a ✔ means that the behavior may occur.
0 ✔ ✔ ✔
1 x ✔ ✔
2 x x ✔
3 x x x
This table demonstrates two points:
♦ Each isolation level eliminates one of the three typical types of
inconsistencies.
♦ Each level eliminates the types of inconsistencies eliminated at all lower
levels.
The four isolation levels have different names under ODBC. These names
are based on the names of the inconsistencies that they prevent, and are
described in “The ValuePtr parameter” on page 111.
Cursor instability
109
multiple tables. More than one table will likely be involved whenever you
use a join or sub-selection within a SELECT statement.
☞ For information on programming SQL procedures and cursors, see
“Using Procedures, Triggers, and Batches” on page 645.
☞ Cursors are used only when you are using Adaptive Server Anywhere
through another application. For more information, see “Using SQL in
Applications” [ASA Programming Guide, page 11].
A related but distinct concern for applications using cursors is whether
changes to underlying data are visible to the application. You can control the
changes that are visible to applications by specifying the sensitivity of the
cursor.
☞ For more information about cursor sensitivity, see “Adaptive Server
Anywhere cursors” [ASA Programming Guide, page 30].
You can change the isolation of your connection and the default level
associated with your user ID using the SET OPTION command. If you have
permission, you can also change the isolation level for other users or groups.
110
Chapter 4. Using Transactions and Isolation Levels
Once you disconnect, your isolation level reverts to its previous value.
Default isolation level When you connect to a database, the database server determines your initial
isolation level as follows:
1. A default isolation level may be set for each user and group. If a level is
stored in the database for your user ID, then the database server uses it.
2. If not, the database server checks the groups to which you belong until it
finds a level. All users are members of the special group PUBLIC. If it
finds no other setting first, then Adaptive Server Anywhere will use the
level assigned to that group.
☞ For more information about users and groups, see “Managing User IDs
and Permissions” [ASA Database Administration Guide, page 397].
☞ For more information about the SET OPTION statement syntax, see
“SET OPTION statement” [ASA SQL Reference, page 591].
☞ You may wish to change the isolation level in mid-transaction if, for
example, just one table or group of tables requires serialized access. For
information about changing the isolation level within a transaction, see
“Changing isolation levels within a transaction” on page 113.
111
ValuePtr Isolation Level
SQL_TXN_READ_- 0
UNCOMMITTED
SQL_TXN_READ_COMMITTED 1
SQL_TXN_REPEATABLE_READ 2
SQL_TXT_SERIALIZABLE 3
Changing an isolation You can change the isolation level of your connection via ODBC using the
level via ODBC function SQLSetConnectOption in the library ODBC32.dll.
The SQLSetConnectOption function takes three parameters: the value of
the ODBC connection handle, the fact that you wish to set the isolation
level, and the value corresponding to the isolation level. These values appear
in the table below.
String Value
SQL_TXN_ISOLATION 108
SQL_TXN_READ_- 1
UNCOMMITTED
SQL_TXN_READ_COMMITTED 2
SQL_TXN_REPEATABLE_READ 4
SQL_TXT_SERIALIZABLE 8
Do not use the SET OPTION statement to change an isolation level from
within an ODBC application. Since the ODBC driver does not parse the
statements, execution of any statement in ODBC will not be recognized by
the ODBC driver.
Example The following function call sets the isolation level of the connection
MyConnection to isolation level 2:
SQLSetConnectOption( MyConnection.hDbc,
SQL_TXN_ISOLATION,
SQL_TXN_REPEATABLE_READ )
ODBC uses the isolation feature to support assorted database lock options.
For example, in PowerBuilder you can use the Lock attribute of the
transaction object to set the isolation level when you connect to the database.
The Lock attribute is a string, and is set as follows:
SQLCA.lock = "RU"
The Lock option is honored only at the moment the CONNECT occurs.
112
Chapter 4. Using Transactions and Isolation Levels
Changes to the Lock attribute after the CONNECT have no effect on the
connection.
Sometimes you will find that different isolation levels are suitable for
different parts of a single transaction. Adaptive Server Anywhere allows you
to change the isolation level of your database in the middle of a transaction.
When you change the ISOLATION_LEVEL option in the middle of a
transaction, the new setting affects only the following:
♦ Any cursors opened after the change
113
Transaction blocking and deadlock
When a transaction is being executed, the database server places locks on
rows to prevent other transactions from interfering with the affected rows.
Locks control the amount and types of interference permitted.
Adaptive Server Anywhere uses transaction blocking to allow transactions
to execute concurrently without interference, or with limited interference.
Any transaction can acquire a lock to prevent other concurrent transactions
from modifying or even accessing a particular row. This transaction
blocking scheme always stops some types of interference. For example, a
transaction that is updating a particular row of a table always acquires a lock
on that row to ensure that no other transaction can update or delete the same
row at the same time.
Transaction blocking
When a transaction attempts to carry out an operation, but is forbidden by a
lock held by another transaction, a conflict arises and the progress of the
transaction attempting to carry out the operation is impeded.
“Two-phase locking” on page 144 describes deadlock, which occurs when
two or more transactions are blocked by each other in such a way that none
can proceed.
Sometimes a set of transactions arrive at a state where none of them can
proceed. For more information, see “Deadlock” on page 115.
If two transactions have each acquired a read lock on a single row, the
behavior when one of them attempts to modify that row depends on the
database setting BLOCKING. To modify the row, that transaction must
block the other, yet it cannot do so while the other transaction has it blocked.
♦ If BLOCKING is ON (the default), then the transaction that attempts to
write waits until the other transaction releases its read lock. At that time,
the write goes through.
♦ If BLOCKING has been set to OFF, then the transaction that attempts to
write receives an error.
When BLOCKING is OFF, the statement terminates instead of waiting and
any partial changes it has made are rolled back. In this event, try executing
the transaction again, later.
Blocking is more likely to occur at higher isolation levels because more
114
Chapter 4. Using Transactions and Isolation Levels
locking and more checking is done. Higher isolation levels usually provide
less concurrency. How much less depends on the individual natures of the
concurrent transactions.
For more information about the BLOCKING option, see “BLOCKING
option [database]” [ASA Database Administration Guide, page 597].
Deadlock
115
Choosing isolation levels
The choice of isolation level depends on the kind of task an application is
carrying out. This section gives some guidelines for choosing isolation
levels.
When you choose an appropriate isolation level you must balance the need
for consistency and accuracy with the need for concurrent transactions to
proceed unimpeded. If a transaction involves only one or two specific values
in one table, it is unlikely to interfere as much with other processes as one
which searches many large tables and may need to lock many rows or entire
tables and may take a very long time to complete.
For example, if your transactions involve transferring money between bank
accounts or even checking account balances, you will likely want to do your
utmost to ensure that the information you return is correct. On the other
hand, if you just want a rough estimate of the proportion of inactive
accounts, then you may not care whether your transaction waits for others or
not and indeed may be willing to sacrifice some accuracy to avoid interfering
with other users of the database.
Furthermore, a transfer may affect only the two rows which contain the two
account balances, whereas all the accounts must be read in order to calculate
the estimate. For this reason, the transfer is less likely to delay other
transactions.
Adaptive Server Anywhere provides four levels of isolation: levels 0, 1, 2,
and 3. Level 3 provides complete isolation and ensures that transactions are
interleaved in such a manner that the schedule is serializable.
Serializable schedules
To process transactions concurrently, the database server must execute some
component statements of one transaction, then some from other transactions,
before continuing to process further operations from the first. The order in
which the component operations of the various transactions are interleaved
is called the schedule.
Applying transactions concurrently in this manner can result in many
possible outcomes, including the three particular inconsistencies described
in the previous section. Sometimes, the final state of the database also could
have been achieved had the transactions been executed sequentially,
meaning that one transaction was always completed in its entirety before the
next was started. A schedule is called serializable whenever executing the
transactions sequentially, in some order, could have left the database in the
same state as the actual schedule.
116
Chapter 4. Using Transactions and Isolation Levels
117
this combination ensures cursor stability without greatly increasing locking
requirements. Adaptive Server Anywhere achieves this benefit through the
early release of read locks acquired for the present row of a cursor. These
locks must persist until the end of the transaction at either levels two or three
in order to guarantee repeatable reads.
For example, a transaction that updates inventory levels through a cursor is
particularly suited to this level, because each of the adjustments to inventory
levels as items are received and sold would not be lost, yet these frequent
adjustments would have minimal impact on other transactions.
Typical level 2 At isolation level 2, rows that match your criterion cannot be changed by
transactions other transactions. You can thus employ this level when you must read rows
more than once and rely that rows contained in your first result set won’t
change.
Because of the relatively large number of read locks required, you should
use this isolation level with care. As with level 3 transactions, careful design
of your database and indexes reduce the number of locks acquired and hence
can improve the performance of your database significantly.
Typical level 3 Isolation level 3 is appropriate for transactions that demand the most in
transactions security. The elimination of phantom rows lets you perform multi-step
operations on a set of rows without fear that new rows will appear partway
through your operations and corrupt the result.
However much integrity it provides, isolation level 3 should be used
sparingly on large systems that are required to support a large number of
concurrent transactions. Adaptive Server Anywhere places more locks at
this level than at any other, raising the likelihood that one transaction will
impede the process of many others.
118
Chapter 4. Using Transactions and Isolation Levels
For example, since all rows read must be locked whether or not they match
the a search criteria, the database server is free to combine the operation of
reading rows and placing locks.
119
Isolation level tutorials
The different isolation levels behave in very different ways, and which one
you will want to use depends on your database and on the operations you are
carrying out. The following set of tutorials will help you determine which
isolation levels are suitable for different tasks.
Tip:
Before altering your database in the following way, it is prudent to test the
change by using SELECT in place of UPDATE.
In this example, you will play the role of two people, both using the
demonstration database concurrently.
1. Start Interactive SQL.
2. Connect to the sample database as the Sales Manager:
♦ In the Connect dialog, choose the ODBC data source ASA 9.0 Sample.
♦ On the Advanced tab, type the following string to make the window
easier to identify:
ConnectionName=Sales Manager
Click OK to connect.
3. Start a second instance of Interactive SQL.
4. Connect to the sample database as the Accountant:
♦ In the Connect dialog, choose the ODBC data source ASA 9.0 Sample.
♦ On the Advanced tab, type the following string to make the window
easier to identify:
ConnectionName=Accountant
120
Chapter 4. Using Transactions and Isolation Levels
♦ Click OK to connect.
5. As the Sales Manager, raise the price of all the tee shirts by $0.95:
♦ In the window labeled Sales Manager, execute the following
commands:
SELECT id, name, unit_price
FROM product;
UPDATE PRODUCT
SET unit_price = unit_price + 95
WHERE NAME = ’Tee Shirt’
The result is:
id name unit_price
21453.00
121
☞ You can eliminate dirty reads and other inconsistencies explained in
“Isolation levels and consistency” on page 108.
7. As the Sales Manager, fix the error by rolling back your first changes and
entering the correct UPDATE command. Check that your new values are
correct.
ROLLBACK;
UPDATE product
SET unit_price = unit_price + 0.95
WHERE NAME = ’Tee Shirt’;
id name unit_price
8. The Accountant does not know that the amount he calculated was in
error. You can see the correct value by executing his SELECT statement
again in his window.
SELECT SUM( quantity * unit_price )
AS inventory
FROM product;
inventory
6687.15
9. Finish the transaction in the Sales Manager’s window. She would enter a
COMMIT statement to make his changes permanent, but you may wish
to enter a ROLLBACK, instead, to avoid changing the copy of the
demonstration database on your machine.
ROLLBACK;
122
Chapter 4. Using Transactions and Isolation Levels
♦ Click OK to connect.
3. Start a second instance of Interactive SQL.
♦ Click OK to connect.
5. Set the isolation level to 1 for the Accountant’s connection by executing
the following command.
SET TEMPORARY OPTION ISOLATION_LEVEL = 1
123
6. Set the isolation level to 1 in the Sales Manager’s window by executing
the following command:
SET TEMPORARY OPTION ISOLATION_LEVEL = 1
id name unit_price
8. The Sales Manager decides to introduce a new sale price for the plastic
visor. As the Sales Manager, execute the following command:
SELECT id, name, unit_price FROM product
WHERE name = ’Visor’;
UPDATE product
SET unit_price = 5.95 WHERE id = 501;
COMMIT;
id name unit_price
9. Compare the price of the visor in the Sales Manager window with the
price for the same visor in the Accountant window. The Accountant
window still displays the old price, even though the Sales Manager has
entered the new price and committed the change.
This inconsistency is called a non-repeatable read, because if the
Accountant did the same SELECT a second time in the same transaction,
he wouldn’t get the same results. Try it for yourself. As the Accountant,
execute the select command again. Observe that the Sales Manager’s sale
price now displays.
SELECT id, name, price
FROM product
124
Chapter 4. Using Transactions and Isolation Levels
id name unit_price
11. The Sales Manager decides that it would be better to delay the sale on the
plastic visor until next week so that she won’t have to give the lower price
on a big order that she’s expecting will arrive tomorrow. In her window,
try to execute the following statements. The command will start to
execute, and then his window will appear to freeze.
UPDATE product
SET unit_price = 7.00
WHERE id = 501
125
Observe that as soon as the database server executes this statement, the
Sales Manager’s transaction completes.
id name unit_price
13. The Sales Manager can finish now. She wishes to commit her change to
restore the original price.
COMMIT
Types of Locks and When you upgraded the Accountant’s isolation from level 1 to level 2, the
different isolation levels database server used read locks where none had previously been acquired. In
general, each isolation level is characterized by the types of locks needed
and by how locks held by other transactions are treated.
At isolation level 0, the database server needs only write locks. It makes use
of these locks to ensure that no two transactions make modifications that
conflict. For example, a level 0 transaction acquires a write lock on a row
before it updates or deletes it, and inserts any new rows with a write lock
already in place.
Level 0 transactions perform no checks on the rows they are reading. For
example, when a level 0 transaction reads a row, it doesn’t bother to check
what locks may or may not have been acquired on that row by other
transactions. Since no checks are needed, level 0 transactions are
particularly fast. This speed comes at the expense of consistency. Whenever
they read a row which is write locked by another transaction, they risk
returning dirty data.
At level 1, transactions check for write locks before they read a row.
Although one more operation is required, these transactions are assured that
all the data they read is committed. Try repeating the first tutorial with the
isolation level set to 1 instead of 0. You will find that the Accountant’s
computation cannot proceed while the Sales Manager’s transaction, which
updates the price of the tee shirts, remains incomplete.
When the Accountant raised his isolation to level 2, the database server
began using read locks. From then on, it acquired a read lock for his
transaction on each row that matched his selection.
Transaction blocking In the above tutorial, the Sales Manager window froze during the execution
of her UPDATE command. The database server began to execute her
command, then found that the Accountant’s transaction had acquired a read
lock on the row that the Sales Manager needed to change. At this point, the
database server simply paused the execution of the UPDATE. Once the
126
Chapter 4. Using Transactions and Isolation Levels
2. Set the isolation level to 2 for the Accountant window by executing the
following command.
SET TEMPORARY OPTION ISOLATION_LEVEL = 2;
3. In the Accountant window, enter the following command to list all the
departments.
SELECT * FROM department
ORDER BY dept_id;
127
dept_id dept_name dept_head_id
The final command creates the new entry for the new department. It
appears as a new row at the bottom of the table in the Sales Manager’s
window.
5. The Accountant, however, is not aware of the new department. At
isolation level 2, the database server places locks to ensure that no row
changes, but places no locks that stop other transactions from inserting
new rows.
The Accountant will only discover the new row if he executes his
SELECT command again. In the Accountant’s window, execute the
SELECT statement again. You will see the new row appended to the
table.
SELECT *
FROM department
ORDER BY dept_id;
128
Chapter 4. Using Transactions and Isolation Levels
the database server acquires locks only on the rows that he is using. Other
rows are left untouched and hence there is nothing to prevent the Sales
Manager from inserting a new row.
6. The Accountant would prefer to avoid such surprises in future, so he
raises the isolation level of his current transaction to level 3. Enter the
following commands for the Accountant.
SET TEMPORARY OPTION ISOLATION_LEVEL = 3
SELECT *
FROM department
ORDER BY dept_id
7. The Sales Manager would like to add a second department to handle sales
initiative aimed at large corporate partners. Execute the following
command in the Sales Manager’s window.
INSERT INTO department
(dept_id, dept_name, dept_head_id)
VALUES(700, ’Major Account Sales’, 902)
The Sales Manager’s window will pause during execution because the
Accountant’s locks block the command. Click the Interrupt the SQL
Statement button on the toolbar (or choose Stop from the SQL menu) to
interrupt this entry.
8. To avoid changing the demonstration database that comes with Adaptive
Server Anywhere, you should roll back the insertion of the new
departments. Execute the following command in the Sales Manager’s
window:
ROLLBACK
When the Accountant raised his isolation to level 3 and again selected all
rows in the department table, the database server placed anti-insert locks on
each row in the table, and one extra phantom lock to avoid insertion at the
end of the table. When the Sales Manager attempted to insert a new row at
the end of the table, it was this final lock that blocked her command.
Notice that the Sales Manager’s command was blocked even though the
Sales Manager is still connected at isolation level 2. The database server
places anti-insert locks, like read locks, as demanded by the isolation level
and statements of each transactions. Once placed, these locks must be
respected by all other concurrent transactions.
☞ For more information on locking, see “How locking works” on
page 135.
129
Practical locking implications tutorial
The following continues the same scenario. In this tutorial, the Accountant
and the Sales Manager both have tasks that involve the sales order and sales
order items tables. The Accountant needs to verify the amounts of the
commission checks paid to the sales employees for the sales they made
during the month of April 2001. The Sales Manager notices that a few
orders have not been added to the database and wants to add them.
Their work demonstrates phantom locking. A phantom lock is a shared
lock placed on an indexed scan position to prevent phantom rows. When a
transaction at isolation level 3 selects rows which match a given criterion,
the database server places anti-insert locks to stop other transactions from
inserting rows which would also match. The number of locks placed on your
behalf depends both on the search criterion and on the design of your
database.
If you have not done so, do steps 1 through 3 of the previous tutorial which
describe how to start two instances of Interactive SQL.
1. Set the isolation level to 2 in both the Sales Manager window and the
Accountant window by executing the following command.
SET TEMPORARY OPTION ISOLATION_LEVEL = 2
130
Chapter 4. Using Transactions and Isolation Levels
3. The Sales Manager notices that a big order sold by Philip Chin was not
entered into the database. Philip likes to be paid his commission
promptly, so the Sales manager enters the missing order, which was
placed on April 25.
In the Sales Manager’s window, enter the following commands. The
Sales order and the items are entered in separate tables because one order
can contain many items. You should create the entry for the sales order
before you add items to it. To maintain referential integrity, the database
server allows a transaction to add items to an order only if that order
already exists.
INSERT into sales_order
VALUES ( 2653, 174, ’2001-04-22’, ’r1’,
’Central’, 129);
INSERT into sales_order_items
VALUES ( 2653, 1, 601, 100, ’2001-04-25’ );
COMMIT;
4. The Accountant has no way of knowing that the Sales Manager has just
added a new order. Had the new order been entered earlier, it would have
been included in the calculation of Philip Chin’s April sales.
In the Accountant’s window, calculate the April sales totals again. Use
the same command, and observe that Philip Chin’s April sales changes to
$4560.00.
131
just entered might be found in the second search and marked as paid,
even though it was not included in Philip’s total April sales!
Because you set the isolation to level 3, the database server will
automatically place anti-insert locks to ensure that the Sales Manager
can’t insert April order items until the Accountant finishes his transaction.
132
Chapter 4. Using Transactions and Isolation Levels
The Sales Manager’s window will hang; the operation will not complete.
Click the Interrupt the SQL Statement button on the toolbar (or choose
Stop from the SQL menu) to interrupt this entry.
10. The Sales Manager can’t enter the order in April, but you might think that
she could still enter it in May.
Change the date of the command to May 05 and try again.
INSERT INTO sales_order
VALUES ( 2653, 174, ’2001-05-05’, ’r1’,
’Central’, 129)
The Sales Manager’s window will hang again. Click the Interrupt the
SQL Statement button on the toolbar (or choose Stop from the SQL
menu) to interrupt this entry. Although the database server places no
more locks than necessary to prevent insertions, these locks have the
potential to interfere with a large number of other transactions.
The database server places locks in table indices. For example, it places a
phantom lock in an index so a new row cannot be inserted immediately
before it. However, when no suitable index is present, it must lock every
row in the table.
In some situations, anti-insert locks may block some insertions into a
table, yet allow others.
11. The Sales Manager wishes to add a second item to order 2651. Use the
following command.
INSERT INTO sales_order_items
VALUES ( 2651, 2, 302, 4, ’2001-05-22’ )
All goes well, so the Sales Manager decides to add the following item to
order 2652 as well.
INSERT INTO sales_order_items
VALUES ( 2652, 2, 600, 12, ’2001-05-25’ )
The Sales Manager’s window will hang. Click the Interrupt the SQL
Statement button on the toolbar (or choose Stop from the SQL menu) to
interrupt this entry.
12. Conclude this tutorial by undoing any changes to avoid changing the
demonstration database. Enter the following command in the Sales
Manager’s window.
ROLLBACK
133
You may now close both windows.
134
Chapter 4. Using Transactions and Isolation Levels
135
the structure of a table, perhaps inserting a new column, could greatly
impact other transactions. In such a case, it is essential to limit the access of
other transactions to prevent errors.
Row orderings You can use an index to order rows based on a particular criterion
established when the index was constructed.
When there is no index, Adaptive Server Anywhere orders rows by their
physical placement on disk. In the case of a sequential scan, the specific
ordering is defined by the internal workings of the database server. You
should not rely on the order of rows in a sequential scan. From the point of
view of scanning the rows, however, Adaptive Server Anywhere treats the
request similarly to an indexed scan, albeit using an ordering of its own
choosing. It can place locks on positions in the scan as it would were it using
an index.
Through locking a scan position, a transaction prevents some actions by
other transactions relating to a particular range of values in that ordering of
the rows. Insert and anti-insert locks are always placed on scan positions.
For example, a transaction might delete a row, hence deleting a particular
primary key value. Until this transaction either commits the change or rolls
it back, it must protect its right to do either. In the case of a deleted row, it
must ensure that no other transaction can insert a row using the same
primary key value, hence making a rollback operation impossible. A lock on
the scan position this row occupied reserves this right while having the least
impact on other transactions.
Types of locks
Adaptive Server Anywhere uses four distinct types of locks to implement its
locking scheme and ensure appropriate levels of isolation between
transactions:
♦ read lock (shared)
♦ phantom lock or anti-insert lock (shared)
♦ write lock (exclusive)
136
Chapter 4. Using Transactions and Isolation Levels
Depending on the isolation level you select, the database server will use
some or all of them to maintain the degree of consistency you require.
The above types of locks have the following uses:
♦ A transaction acquires a write lock whenever it inserts, updates, or
deletes a row. No other transaction can obtain either a read or a write lock
on the same row when a write lock is set. A write lock is an exclusive
lock.
♦ A transaction can acquire a read lock when it reads a row. Several
transactions can acquire read locks on the same row (a read lock is a
shared or nonexclusive lock). Once a row has been read locked, no other
transaction can obtain a write lock on it. Thus, a transaction can ensure
that no other transaction modifies or deletes a row by acquiring a read
lock.
♦ An anti-insert lock, or phantom lock, is placed on a scan position to
prevent phantom rows. It prevents other transactions from inserting a row
into a table immediately before the row which is anti-insert locked.
Anti-insert locks for lookups using indexes require a read lock on each
row that is read, and one extra read lock to prevent insertions into the
index at the end of the result set. Phantom rows for lookups that do not
use indexes require a read lock on all rows in a table to prevent insertions
from altering the result set, and so can have a bad effect on concurrency.
♦ An insert lock, or anti-phantom lock, is placed on a scan position to
reserve the right to insert a row. Once one transaction acquires an insert
lock on a row, no other transaction can acquire an anti-insert lock on the
same row. A read lock on the corresponding row is always acquired at the
same time as an insert lock to ensure that no other process can update or
destroy the row, thereby bypassing the insert lock.
Adaptive Server Anywhere uses these four types of locks as necessary to
ensure the level of consistency that you require. You do not need to
explicitly request the use of a particular lock. Instead, you control the level
of consistency, as is explained in the next section. Knowledge of the types of
locks will guide you in choosing isolation levels and understanding the
impact of each level on performance.
Exclusive versus shared These four types of locks each fall into one of two categories:
locks
♦ Exclusive locks Only one transaction can hold an exclusive lock on a
row of a table at one time. No transaction can obtain an exclusive lock
while any other transaction holds a lock of any type on the same row.
Once a transaction acquires an exclusive lock, requests to lock the row by
other transactions will be denied.
137
Write locks are exclusive.
♦ Shared locks Any number of transactions may acquire shared locks on
any one row at the same time. Shared locks are sometimes referred to as
non-exclusive locks.
Read locks, insert locks, and anti-insert locks are shared.
Only one transaction should change any one row at one time. Otherwise,
two simultaneous transactions might try to change one value to two different
new ones. Hence, it is important that a write lock be exclusive.
By contrast, no difficulty arises if more than one transaction wants to read a
row. Since neither is changing it, there is no conflict of interest. Hence, read
locks may be shared.
You may apply similar reasoning to anti-insert and insert locks. Many
transactions can prevent the insertion of a row in a particular scan position
by each acquiring an anti-insert lock. Similar logic applies for insert locks.
When a particular transaction requires exclusive access, it can easily achieve
exclusive access by obtaining both an anti-insert and an insert lock on the
same row. These locks do not conflict when they are held by the same
transaction.
Which specific locks The following table identifies the combination of locks that conflict.
conflict?
read write anti-insert insert
read conflict
anti-insert conflict
insert conflict
These conflicts arise only when the locks are held by different transactions.
For example, one transaction can obtain both anti-insert and insert locks on a
single scan position to obtain exclusive access to a location.
138
Chapter 4. Using Transactions and Isolation Levels
user to interpret the result of these queries with this limitation in mind.
SELECT statements at You may be surprised to learn that Adaptive Server Anywhere uses almost
isolation level 1 no more locks when running a transaction at isolation level 1 than it does at
isolation level 0. Indeed, the database server modifies its operation in only
two ways.
The first difference in operation has nothing to do with acquiring locks, but
rather with respecting them. At isolation level 0, a transaction is free to read
any row, whether or not another transaction has acquired a write lock on it.
By contrast, before reading each row an isolation level 1 transaction must
check whether a write lock is in place. It cannot read past any write-locked
rows because doing so might entail reading dirty data.
The second difference in operation creates cursor stability. Cursor stability is
achieved by acquiring a read lock on the current row of a cursor. This read
lock is released when the cursor is moved. More than one row may be
affected if the contents of the cursor is the result of a join. In this case, the
database server acquires read locks on all rows which have contributed
information to the cursor’s current row and removes all these locks as soon
as another row of the cursor is selected as current.
SELECT statements at At isolation level 2, Adaptive Server Anywhere modifies its procedures to
isolation level 2 ensure that your reads are repeatable. If your SELECT command returns
values from every row in a table, then the database server acquires a read
lock on each row of the table as it reads it. If, instead, your SELECT
contains a WHERE clause, or another condition which restricts the rows to
selected, then the database server instead reads each row, tests the values in
the row against your criterion, and then acquires a read lock on the row if it
meets your criterion.
As at all isolation levels, the locks acquired at level 2 include all those set at
levels 1 and 0. Thus, cursor stability is again ensured and dirty reads are not
permitted.
SELECT statements at When operating at isolation level 3, Adaptive Server Anywhere is obligated
isolation level 3 to ensure that all schedules are serializable. In particular, in addition to the
requirements imposed at each of the lower levels, it must eliminate phantom
rows.
To accommodate this requirement, the database server uses read locks and
anti-insert locks. When you make a selection, the database server acquires a
read lock on each row that contributes information to your result set. Doing
so ensures that no other transactions can modify that material before you
have finished using it.
This requirement is similar to the procedures that the database server uses at
139
isolation level 2, but differs in that a lock must be acquired for each row
read, whether or not it meets any attached criteria. For example, if you select
the names of all employees in the sales department, then the server must
lock all the rows which contain information about a sales person, whether
the transaction is executing at isolation level 2 or 3. At isolation level 3,
however, it must also acquire read locks on each of the rows of employees
which are not in the sales department. Otherwise, someone else accessing
the database could potentially transfer another employee to the sales
department while you were still using your results.
The fact that a read lock must be acquired on each row whether or not it
meets your criteria has two important implications.
♦ The database server may need to place many more locks than would be
necessary at isolation level 2.
♦ The database server can operate a little more efficiently: It can
immediately acquire a read lock on each row at as it reads it, since the
locks must be placed whether or not the information in the row is
accepted.
The number of anti-insert locks the server places can very greatly and
depends upon your criteria and on the indexes available in the table.
Suppose you select information about the employee with Employee ID 123.
If the employee ID is the primary key of the employee table, then the
database server can economize its operations. It can use the index, which is
automatically built for a primary key, to locate the row efficiently. In
addition, there is no danger that another transaction could change another
Employee’s ID to 123 because primary key values must be unique. The
server can guarantee that no second employee is assigned that ID number
simply by acquiring a read lock on only the one row containing information
about the employee with that number.
By contrast, the database server would acquire more locks were you instead
to select all the employees in the sales department. Since any number of
employees could be added to the department, the server will likely have to
read every row in the employee table and test whether each person is in
sales. If this is the case, both read and anti-insert locks must be acquired for
each row.
140
Chapter 4. Using Transactions and Isolation Levels
1. Make a location in memory to store the new row. The location is initially
hidden from the rest of the database, so there is as yet no concern that
another transaction could access it.
2. Fill the new row with any supplied values.
3. Write lock the new row.
4. Place an insert lock in the table to which the row is being added. Recall
that insert locks are exclusive, so once the insert lock is acquired, no
other transaction can block the insertion by acquiring an anti-insert lock.
5. Insert the row into the table. Other transactions can now, for the first
time, see that the new row exists. They can’t modify or delete it, though,
because of the write lock acquired earlier.
6. Update all affected indexes and verify both referential integrity and
uniqueness, where appropriate. Verifying referential integrity means
ensuring that no foreign key points to a primary key that does not exist.
Primary key values must be unique. Other columns may also be defined
to contain only unique values, and if any such columns exist, uniqueness
is verified.
8. Insert other rows as required, if you have selected the cascade option, and
fire triggers.
Uniqueness You can ensure that all values in a particular column, or combination of
columns, are unique. The database server always performs this task by
building an index for the unique column, even if you do not explicitly create
one.
In particular, all primary key values must be unique. The database server
automatically builds an index for the primary key of every table. Thus, you
should not ask the database server to create an index on a primary key, as
that index would be a redundant index.
Orphans and referential A foreign key is a reference to a primary key, usually in another table. When
integrity that primary key doesn’t exist, the offending foreign key is called an
orphan. Adaptive Server Anywhere automatically ensures that your
database contains no orphans. This process is referred to as verifying
referential integrity. The database server verifies referential integrity by
counting orphans.
141
WAIT FOR COMMIT You can ask the database server to delay verifying referential integrity to the
end of your transaction. In this mode, you can insert one row which contains
a foreign key, then insert a second row which contains the missing primary
key. You must perform both operations in the same transaction. Otherwise,
the database server will not allow your operations.
To request that the database server delay referential integrity checks until
commit time, set the value of the option WAIT_FOR_COMMIT to ON. By
default, this option is OFF. To turn it on, issue the following command:
SET OPTION WAIT_FOR_COMMIT = ON;
142
Chapter 4. Using Transactions and Isolation Levels
operations. The amount of work that the database server needs to do is much
less if the value you are changing is not part of a primary or foreign key. It is
lower still if it is not contained in an index, either explicitly or implicitly
because you have declared that attribute unique.
The operation of verifying referential integrity during an UPDATE operation
is no less simple than when the verification is performed during an INSERT.
In fact, when you change the value of a primary key, you may create
orphans. When you insert the replacement value, the database server must
check for orphans once more.
143
phantom lock may suffice. Other arrangements can quickly escalate the
number of locks required. For example, the table may have no primary key
or other index associated with any of the attributes. Since the rows in a table
have no fundamental ordering, the only way of preventing inserts may be to
anti-insert lock the entire table.
Deleting a row can mean acquiring a great many locks. You can minimize
the effect on concurrency in your database in a number of ways. As
described earlier, indexes and primary keys reduce the number of locks
required because they impose an ordering on the rows in the table. The
database server automatically takes advantage of these orderings. Instead of
acquiring locks on every row in the table, it can simply lock the next row.
Without the index, the rows have no order and thus the concept of a next row
is meaningless.
The database server acquires anti-insert locks on the row following the row
deleted. Should you delete the last row of a table, the database server simply
places the anti-insert lock on an invisible end row. In fact, if the table
contains no index, the number of anti-insert locks required is one more than
the number of rows in the table.
Anti-insert locks and While one or more anti-insert locks exclude an insert lock and one or more
read locks read locks exclude a write lock, no interaction exists between
anti-insert/insert locks and read/write locks. For example, although a write
lock cannot be acquired on a row that contains a read lock, it can be acquired
on a row that has only an anti-insert lock. More options are open to the
database server because of this flexible arrangement, but it means that the
server must generally take the extra precaution of acquiring a read lock
when acquiring an anti-insert lock. Otherwise, another transaction could
delete the row.
Two-phase locking
Often, the general information about locking provided in the earlier sections
will suffice to meet your needs. There are times, however, when you may
benefit from more knowledge of what goes on inside the database server
when you perform basic types of operations. This knowledge will provide
you with a better basis from which to understand and predict potential
problems that users of your database may encounter.
Two-phase locking is important in the context of ensuring that schedules are
serializable. The two-phase locking protocol specifies a procedure each
transaction follows.
This protocol is important because, if observed by all transactions, it will
guarantee a serializable, and thus correct, schedule. It may also help you
144
Chapter 4. Using Transactions and Isolation Levels
2. After releasing a lock, a transaction must never acquire any more locks.
In practice, a transaction normally holds locks until it terminates with either
a COMMIT or ROLLBACK statement. Releasing locks before the end of
the transaction disallows the operation of rolling back the changes whenever
doing so would necessitate operating on rows to return them to an earlier
state.
The two-phase locking protocol allows the statement of the following
important theorem:
145
Isolation level Read locks
0 None
146
Chapter 4. Using Transactions and Isolation Levels
level 1, transactions acquire a read lock on a row only when it becomes the
current row of a cursor. Under isolation level 1, however, when that row is
no longer current, the lock is released. This behavior is acceptable because
the database server does not need to guarantee repeatable reads at isolation
level 1.
☞ For more information about isolation levels, see “Choosing isolation
levels” on page 116.
Special optimizations
The previous sections describe the locks acquired when all transactions are
operating at a given isolation level. For example, when all transactions are
running at isolation level 2, locking is performed as described in the
appropriate section, above.
In practice, your database is likely to need to process multiple transactions
that are at different levels. A few transactions, such as the transfer of money
between accounts, must be serializable and so run at isolation level 3. For
other operations, such as updating an address or calculating average daily
sales, a lower isolation level will often suffice.
While the database server is not processing any transactions at level 3, it
optimizes some operations so as to improve performance. In particular,
many extra anti-insert and insert locks are often necessary to support a
level 3 transaction. Under some circumstances, the database server can avoid
either placing or checking for some types of locks when no level 3
transactions are present.
For example, the database server uses anti-insert locks to guard against two
distinct types of circumstances:
1. Ensure that deletes in tables with unique attributes can be rolled back.
2. Eliminate phantom rows in level 3 transactions.
If no level 3 transactions are using a particular table, then the database server
need not place anti-insert locks in the index of a table that contains no
unique attributes. If, however, even one level 3 transaction is present, all
transactions, even those at level 0, must place anti-insert locks so that the
level 3 transactions can identify their operations.
Naturally, the database server always attaches notes to a table when it
attempts the types of optimizations described above. Should a level 3
transaction suddenly start, you can be confident that the necessary locks will
be put in place for it.
You may have little control over the mix of isolation levels in use at one time
147
as so much will depend on the particular operations that the various users of
your database wish to perform. Where possible, however, you may wish to
select the time that level 3 operations execute because they have the potential
to cause significant slowing of database operations. The impact is magnified
because the database server is forced to perform extra operations for
lower-level operations.
148
Chapter 4. Using Transactions and Isolation Levels
149
will not match that produced on another. They can therefore be used as
keys in replication and synchronization environments.
For more information about generating unique identifiers, see “The
NEWID default” on page 86.
On inserts into the table, if a value is not specified for the autoincrement
column, a unique value is generated. If a value is specified, it will be
used. If the value is larger than the current maximum value for the
column, that value will be used as a starting point for subsequent inserts.
The value of the most recently inserted row in an autoincrement column
is available as the global variable @@identity.
150
Chapter 4. Using Transactions and Isolation Levels
151
Replication and concurrency
Some computers on your network might be portable computers that people
take away from the office or which are occasionally connected to the
network. There may be several database applications that they would like to
use while not connected to the network.
Database replication is the ideal solution to this problem. Using
SQL Remote or MobiLink synchronization, you can publish information in a
consolidated, or master, database to any number of other computers. You
can control precisely the information replicated on any particular computer.
Any person can receive particular tables, or even portions of the rows or
columns of a table. By customizing the information each receives, you can
ensure that their copy of the database is no larger than necessary to contain
the information they require.
☞ Extensive information on SQL Remote replication and MobiLink
synchronization is provided in the separate manuals entitled SQL Remote
User’s Guide and MobiLink Synchronization User’s Guide. The information
in this section is, thus, not intended to be complete. Rather, it introduces
concepts related directly to locking and concurrency considerations.
SQL Remote and MobiLink allow replicated databases to be updated from a
central, consolidated database, as well as updating this same central data as
the results of transactions processed on the remote machine. Since updates
can occur in either direction, this ability is referred to as bi-directional
replication.
Since the results of transactions can affect the consolidated database,
whether they are processed on the central machine or on a remote one, the
effect is that of allowing concurrent transactions.
Transactions may happen at the same time on different machines. They may
even involve the same data. In this case, though, the machines may not be
physically connected. No means may exist by which the remote machine can
contact the consolidated database to set any form of lock or identify which
rows have changed. Thus, locks can not prevent inconsistencies as they do
when all transactions are processed by a single server.
An added complication is introduced by the fact that any given remote
machine may not hold a full copy of the database. Consider a transaction
executed directly on the main, consolidated database. It may affect rows in
two or more tables. The same transaction might not execute on a remote
database, as there is no guarantee that one or both of the affected tables is
replicated on that machine. Even if the same tables exist, they may not
contain exactly the same information, depending upon how recently the
152
Chapter 4. Using Transactions and Isolation Levels
153
☞ SQL Remote provides the tools and programming facilities you need to
take full advantage of database replication. For further information, see the
SQL Remote User’s Guide and the MobiLink Synchronization User’s Guide.
154
Chapter 4. Using Transactions and Isolation Levels
Summary
Transactions and locking are perhaps second only in importance to relations
between tables. The integrity and performance of any database can benefit
from the judicious use of locking and careful construction of transactions.
Both are essential to creating databases that must execute a large number of
commands concurrently.
Transactions group SQL statements into logical units of work. You may end
each by either rolling back any changes you have made or by committing
these changes and so making them permanent.
Transactions are essential to data recovery in the event of system failure.
They also play a pivotal role in interweaving statements from concurrent
transactions.
To improve performance, multiple transactions must be executed
concurrently. Each transaction is composed of component SQL statements.
When two or more transactions are to be executed concurrently, the database
server must schedule the execution of the individual statements. Concurrent
transactions have the potential to introduce new, inconsistent results that
could not arise were these same transactions executed sequentially.
Many types of inconsistencies are possible, but four typical types are
particularly important because they are mentioned in the ISO SQL/92
standard and the isolation levels are defined in terms of them.
♦ Dirty read One transaction reads data modified, but not yet
committed, by another.
♦ Non-repeatable read A transaction reads the same row twice and gets
different values.
♦ Phantom row A transaction selects rows, using a certain criterion,
twice and finds new rows in the second result set.
♦ Lost Update One transaction’s changes to a row are completely lost
because another transaction is allowed to save an update based on earlier
data.
A schedule is called serializable whenever the effect of executing the
statements according to the schedule is the same as could be achieved by
executing each of the transactions sequentially. Schedules are said to be
correct if they are serializable. A serializable schedule will cause none of
the above inconsistencies.
Locking controls the amount and types of interference permitted. Adaptive
Server Anywhere provides you with four levels of locking: isolation levels 0,
155
1, 2, and 3. At the highest isolation, level 3, Adaptive Server Anywhere
guarantees that the schedule is serializable, meaning that the effect of
executing all the transactions is equivalent to running them sequentially.
Unfortunately, locks acquired by one transaction may impede the progress of
other transactions. Because of this problem, lower isolation levels are
desirable whenever the inconsistencies they may allow are tolerable.
Increased isolation to improve data consistency frequently means lowering
the concurrency, the efficiency of the database at processing concurrent
transactions. You must frequently balance the requirements for consistency
against the need for performance to determine the best isolation level for
each operation.
Conflicting locking requirements between different transactions may lead to
blocking or deadlock. Adaptive Server Anywhere contains mechanisms for
dealing with both these situations, and provides you with options to control
them.
Transactions at higher isolation levels do not, however, always impact
concurrency. Other transactions will be impeded only if they require access
to locked rows. You can improve concurrency through careful design of your
database and transactions. For example, you can shorten the time that locks
are held by dividing one transaction into two shorter ones, or you might find
that adding an index allows your transaction to operate at higher isolation
levels with fewer locks.
The increased popularity of portable computers will frequently mean that
your database may need to be replicated. Replication is an extremely
convenient feature of Adaptive Server Anywhere, but it introduces new
considerations related to concurrency. These topics are covered in a separate
manual.
156
CHAPTER 5
About this chapter This chapter describes how to monitor and improve the performance of your
database.
Contents Topic: page
Fragmentation 198
157
Performance analysis tools
While improving database performance doesn’t have to be labour-intensive,
it’s always best to start with a plan. Evaluate the current performance of your
database, and consider all your options before changing anything. By
re-evaluating your database schema using Adaptive Server Anywhere’s
performance features and performance analysis tools, you can diagnose and
correct performance problems and keep your database performing at its
optimum.
Ultimately, how well your database performs depends heavily on the design
of its bones. And so, one of the most basic of ways of improving
performance is with good schema design. The database schema is the
skeleton of your database, and includes definitions of such things as tables,
views, triggers, and the relationships between them. Re-evaluate your
database schema and make note of the following areas where small changes
can offer impressive gains.
A variety of tools are available to help you analyze and monitor the current
performance of your Adaptive Server Anywhere database. Tools include
request-level logging, procedure profiling, graphical plans, the Performance
Monitor and timing utilities.
Request-level logging
Request-level logging is a good starting point for performance analysis of a
specific application when it is not obvious whether the server or the client is
at fault. It is also useful in determining the specific request to the server that
might be responsible for problems.
Request level logging logs individual requests received from and responses
sent to an application. It’s most useful for determining what the server is
being asked to do by the application.
Logged information includes timestamps, connection ids, and request type,
for example. You can use the -zr database server option to specify what type
of information is logged. You can redirect the output to a file for further
analysis using the -zo option.
The sa_get_request_times ( [ request_log_filename [, connection_id ] ] )
stored procedure reads a request-level log and populates a global temporary
table satmp_request_time with statements from the log and their execution
times. The time recorded is straightforward for INSERT/UPDATE/DELETE
statements. For queries, the time recorded is the total elapsed time from
PREPARE to DROP (describe/open/fetch/close). That means you need to be
aware of any open cursors.
158
Chapter 5. Monitoring and Improving Performance
❖ To reset filtering
1. Use either of the following two statements, to reset either by connection
or by database:
call sa_server_option(’requests_for_connection’,-1)
call sa_server_option(’requests_for_database’,-1)
Outputting host variables Host variable values can be output to a request log.
to request-level logs
159
❖ To include host variable values
1. To include host variable values in the request-level log:
♦ use the -zr server command line option with a value of sql+hostvars
♦ execute the following:
call sa_server_option(’request_level_
logging’,’sql+host’)
Index Consultant
The Index Consultant recommends indexes that can help improve
performance for large workloads or complex queries. It takes as input either
a single query or a workload of queries, and recommends indexes to add to
the database as well as unused indexes to drop.
The Index Consultant creates a set of virtual indexes in the database. It
estimates query execution costs using those indexes to see which indexes
lead to improved execution plans. The Index Consultant evaluates multiple
column indexes as well as single-column indexes, and also investigates the
impact of clustered or unclustered indexes.
☞ For more information about the Index Consultant, see “Index Consultant
overview” on page 67.
Procedure profiling
Procedure profiling shows you how long it takes your stored procedures,
functions, events, and triggers to execute. You can also view the execution
time for each line of a procedure. Using the database profiling information,
you can determine which procedures can be fine-tuned to improve
performance within your database.
Procedure profiling can help you analyze specific database procedures
(including stored procedures, functions, events and triggers) found to be
expensive via request level logging. It can also help you discover expensive,
hidden, procedures, for example triggers, events, and nested stored
procedure calls. As well, it can help pin-point potential problem areas within
the body of a procedure
You can use stored procedures to view procedure profiling information that
has been gathered by the server. The sa_procedure_profile_summary
160
Chapter 5. Monitoring and Improving Performance
stored procedure provides information about all of the procedures within the
database. You can use this procedure to view the profiling data for stored
procedures, functions, events, and triggers within the same result set.
However, a better way to examine this information is to use Sybase Central.
Profiling can be enabled/disabled dynamically and the data it generates is
transient, stored in memory by the server. You can view it using the Profile
tab in Sybase Central. Once profiling is enabled, the database gathers
profiling information until you disable profiling or until the server is shut
down. Profiling information is cumulative, and accurate to 1 ms.
Graphical plan
The graphical plan feature in Interactive SQL displays the execution plan for
a query. It is useful for diagnosing performance issues with specific queries.
For example, the information in the plan may help you decide where to add
an index to your database.
The graphical plan provides a great deal more information than the short or
long plans. You can choose to see the graphical plan either with or without
statistics. Both allow you to quickly view which parts of the plan have been
estimated as the most expensive. The graphical plan with statistics, though
more expensive to view, also provides the actual query execution statistics as
monitored by the server when the query is executed, and permits direct
comparison between the estimates used by the query optimizer in
constructing the access plan with the actual statistics monitored during
execution. Note, however, that the optimizer is often unable to precisely
estimate a query’s cost, so expect there to be differences. The graphical plan
is the default format for access plans.
You can obtain detailed information about the nodes in the plan by clicking
the node in the graphical diagram. The graphical plan with statistics shows
you all the estimates that are provided with the graphical plan, but also
shows actual runtime costs of executing the statement. To do this, the
statement must actually be executed. This means that there may be a delay
in accessing the plan for expensive queries. It also means that any parts of
your query such as deletes or updates are actually executed, although you
can perform a rollback to undo these changes.
Use the graphical plan with statistics when you are having performance
problems, and the estimated row count or run time differs from your
expectations. The graphical plan with statistics provides estimates and actual
statistics for you to compare. A large difference between actual and estimate
is a warning sign that the optimizer might not have sufficient information to
prepare correct estimates.
161
Following are some of the key statistics you can check in the graphical plan
with statistics, and some possible remedies:
♦ Row count measures the rows in the result set. If the estimated row count
is significantly different from the actual row count, the selectivity of
underlying predicates is probably incorrect.
♦ Accurate selectivity estimates are critical for the proper operation of the
query optimizer. For example, if the optimizer mistakenly estimates a
predicate to be highly selective (with, say, a selectivity of 5%), but in
reality, the predicate is much less selective (for example, 50%), then
performance may suffer. In general, estimates may not be precise.
However, a significantly large error does indicate a possible problem. If
the predicate is over a base column for which there does not exist a
histogram, executing a CREATE STATISTICS statement to create a
histogram may correct the problem. If selectivity error remains a problem
then, as a last resort, you may wish to consider specifying a user estimate
of selectivity along with the predicate in the query text.
♦ If the number of cache reads and cache hits are exactly the same, then
your entire database is in cache—an excellent thing. When reads are
greater than hits, it means that the server is attempting to go to cache but
failing, and that it must read from disk. In some cases, such as hash joins,
this is expected. In other cases, such as nested loops joins, a poor
cache-hit ratio may indicate a performance problem, and you may benefit
from increasing your cache size.
Performance Monitor
The Performance Monitor is useful for tracking detailed information about
database server actions, including disk and memory access.
With the Sybase Central Performance Monitor, you can graph a variety of
statistics of any Adaptive Server Anywhere database server that you can
162
Chapter 5. Monitoring and Improving Performance
Timing utilities
Some performance testing utilities, including fetchtst, instest, and trantest,
are available in the <installation-dir>\samples\asa\ directory. Complete
documentation can be found in the Readme.txt file in the same folder as the
utility. These tools will give you more accurate timings than the graphical
plan with statistics. These utilities can provide an indication of the best
achievable performance (for example, throughput) for a given server and
database configuration.
Fetchtst measures fetch rates for an arbitrary query. Instest determines the
time required for rows to be inserted into a table. Trantest measures the load
that can be handled by a given server configuration given a database design
and a set of transactions.
Concurrency
When the database server processes a transaction, it can lock one or more
rows of a table. The locks maintain the reliability of information stored in
163
the database by preventing concurrent access by other transactions. They
also improve the accuracy of result queries by identifying information which
is in the process of being updated.
The database server places these locks automatically and needs no explicit
instruction. It holds all the locks acquired by a transaction until the
transaction is completed. The transaction that has access to the row is said to
hold the lock. Depending on the type of lock, other transactions may have
limited access to the locked row, or none at all.
Performance can be compromised if a row or rows are frequently accessed
by a number of users simultaneously. If you suspect locking problems,
consider using the sa_locks procedure to obtain information on locks in the
database. If lock issues are identified, information on the connection
processes involved can be found using the AppInfo connection property.
164
Chapter 5. Monitoring and Improving Performance
Tip
Always use a transaction log. It helps protect your data and it greatly
improves performance.
If you can store the transaction log on a different physical device than the
one containing the main database file, you can further improve performance.
The extra drive head does not generally have to seek to get to the end of the
transaction log.
165
On Windows CE and Novell NetWare, the size of the cache is set on the
command line when you launch the database server. Be sure to allocate as
much memory to the database cache as possible, given the requirements of
the other applications and processes that run concurrently. In particular,
databases using Java objects benefit greatly from larger cache sizes. If you
use Java in your database, consider a cache of at least 8 Mb.
Tip
Increasing the cache size can often improve performance dramatically,
since retrieving information from memory is many times faster than
reading it from disk. You may find it worthwhile to purchase more RAM
to allow a larger cache.
☞ For more information, see “Using the cache to improve performance” on
page 180.
166
Chapter 5. Monitoring and Improving Performance
167
additional files holding the same database. You choose a location for it,
appropriate to your needs.
The transaction log file is required for recovery of the information in your
database in the event of a failure. For extra protection, you can maintain a
duplicate in a third type of file called a transaction log mirror file.
Adaptive Server Anywhere writes the same information at the same time to
each of these files.
Tip
By placing the transaction log mirror file (if you use one) on a physically
separate drive, you gain better protection against disk failure, and Adaptive
Server Anywhere runs faster because it can efficiently write to the log
and log mirror files. To specify the location of the transaction log and
transaction log mirror files, use the dblog command line utility, or the
Change Log File Settings utility in Sybase Central.
Adaptive Server Anywhere may need more space than is available to it in the
cache for such operations as sorting and forming unions. When it needs this
space, it generally uses it intensively. The overall performance of your
database becomes heavily dependent on the speed of the device containing
the fourth type of file, the temporary file.
Tip
If the temporary file is on a fast device, physically separate from the one
holding the database file, Adaptive Server Anywhere will run faster. This
is because many of the operations that necessitate using the temporary
file also require retrieving a lot of information from the database. Placing
the information on two separate disks allows the operations to take place
simultaneously.
Choose the location of your temporary file carefully. Adaptive Server
Anywhere examines the following environment variables, in the order
shown, to determine the directory in which to place the temporary file.
1. ASTMP
2. TMP
3. TMPDIR
4. TEMP
If none of these is defined, Adaptive Server Anywhere places its temporary
file in the current directory—not a good location for the best performance.
168
Chapter 5. Monitoring and Improving Performance
☞ For information about transaction logs and the dbcc utility, see
“Transaction log utility options” [ASA Database Administration Guide,
page 543].
169
manual commit mode” [ASA Programming Guide, page 47].
170
Chapter 5. Monitoring and Improving Performance
Declare constraints
Undeclared primary key-foreign key relationships exist between tables when
171
there is an implied relationship between the values of columns in different
tables. It is true that not declaring the relationship can save time on index
maintenance, however, declaring the relationship can improve performance
of queries when joins take place because the cost model is able to do a better
job of estimation.
172
Chapter 5. Monitoring and Improving Performance
173
Use an appropriate page size
The page size you choose can affect the performance of your database.
Adaptive Server Anywhere supports page sizes of (in bytes) 1024, 2048,
4096, 8192, 16384, or 32768, with 2048 being the default. There are
advantages and disadvantages to whichever page size you choose.
While smaller pages hold less information and may force less efficient use of
space, particularly if you insert rows that are slightly more than half a page
in size. However, small page sizes allow Adaptive Server Anywhere to run
with fewer resources because it can store more pages in a cache of the same
size. Small pages are particularly useful if your database must run on small
machines with limited memory. They can also help in situations when you
use your database primarily to retrieve small pieces of information from
random locations.
By contrast, a larger page sizes help Adaptive Server Anywhere read
databases more efficiently. Large page sizes also tend to benefit large
databases, and queries that perform sequential table scans. Often, the
physical design of disks permits them to retrieve fewer large blocks more
efficiently than many small ones. Other benefits of large page sizes include
improving the fan-out of your indexes, thereby reducing the number of index
levels, and allowing tables to include more columns.
Keep in mind that larger page sizes have additional memory requirements.
And since the maximum number of rows stored on a page is 255, tables with
small rows will not fill each page and therefore use space inefficiently. As
well, extremely large page sizes (16 kb or 32 kb) are not recommended for
most applications unless you can be sure that a large database server cache is
always available. Investigate the effects of increased memory and disk space
on performance characteristics before using 16 kb or 32 kb page sizes.
The server’s memory usage is proportional to the number of databases
loaded, and the page size of the databases. It is strongly recommended that
you do performance testing (and testing in general) when choosing a page
size. Then choose the smallest page size (>= 4K) that gives satisfactory
results. It is particularly important to pick the correct (and reasonable) page
size if a large number of databases are going to be started on the same server.
You cannot change the page size of an existing database. Instead you must
create a new database and use the -p option of dbinit to specify the page
size. For example, the following command creates a database with 4K pages.
dbinit -p 4096 new.db
☞ For more information about larger page sizes, see “Setting a maximum
174
Chapter 5. Monitoring and Improving Performance
☞ For more information about detecting and fixing file, table, and index
fragmentation, see “Fragmentation” on page 198.
Eliminate operating To eliminate operating system file fragmentation problems, periodically run
system file fragmentation one of the available disk defragmentation utilities. File fragmentation can
have a detrimental impact on performance.
The database server determines the number of file fragments in the database
file when you start a database on Windows NT/2000/XP, and displays the
following information in the server message window when the number of
fragments is greater than one:
175
Database file "mydatabase.db" consists of nnn fragments
You can also obtain the number of database file fragments using the
DBFileFragments database property.
Minimize index Indexes are designed to speed up searches on particular columns, but they
fragmentation can become fragmented if many DELETEs are performed on the indexed
table. This may result in reduced performance if the index is accessed
frequently and the cache is not large enough to hold all of the index.
The sa_index_density stored procedure provides information about the
degree of fragmentation in a database’s indexes. You must have DBA
authority to run this procedure. The following statement calls the
sa_index_density stored procedure:
CALL sa_index_density ([’table_name’[,’owner_name’]])
♦ You can specify the percentage of space in a table page that should be
reserved for future updates. This PCTFREE specification can be set with
176
Chapter 5. Monitoring and Improving Performance
177
access plans that utilize an index to satisfy a query’s ORDER BY clause,
rather than plans that require an explicit sorting operation.
You can use the FASTFIRSTROW table hint in a query’s FROM clause to
set the optimization goal for a specific query to first-row, without having to
change the OPTIMIZATION_GOAL setting.
If the option is set to all-rows (the default), then Adaptive Server Anywhere
optimizes a query so as to choose an access plan with the minimal estimated
total retrieval time. Setting OPTIMIZATION_GOAL to all-rows may be
appropriate for applications that intend to process the entire result set, such
as PowerBuilder DataWindow applications.
178
Chapter 5. Monitoring and Improving Performance
179
Using the cache to improve performance
The database cache is an area of memory used by the database server to
store database pages for repeated fast access. The more pages that are
accessible in the cache, the fewer times the database server needs to read
data from disk. As reading data from disk is a slow operation, the amount of
cache available is often a key factor in determining performance.
You can control the size of the database cache on the database server
command line when the database is started.
Dynamic cache sizing Adaptive Server Anywhere provides automatic resizing of the database
cache. The capabilities are different on different operating systems. On
Windows NT/2000/XP, Windows 95/98/Me, and UNIX operating systems,
the cache grows and shrinks. Details are provided in the following sections.
Full dynamic cache sizing helps to ensure that the performance of your
database server is not impacted by allocating inadequate memory. The cache
grows when the database server can usefully use more, as long as memory is
available, and shrinks when cache memory is required by other applications,
so that the database server does not unduly impact other applications on the
system. The effectiveness of dynamic cache sizing is limited, of course, by
the physical memory available on your system.
Generally, dynamic cache sizing assesses cache requirements at the rate of
approximately once per minute. However, after a new database is started or
when a file grows significantly, statistics are sampled and the cache may be
resized every five seconds for thirty seconds. After the initial thirty second
period, the sampling rate drops back down to once per minute. Significant
growth of a file is defined as a 1/8 growth since the database started or since
the last growth that triggered an increase in the sampling rate. This change
improves performance further, by adapting the cache size more quickly
when databases are started dynamically and when a lot of data is inserted.
Dynamic cache sizing removes the need for explicit configuration of
database cache in many situations, making Adaptive Server Anywhere even
easier to use.
There is no dynamic cache resizing on Windows CE or Novell NetWare.
When an Address Windowing Extensions (AWE) cache is used, dynamic
cache sizing is disabled.
☞ For more information about AWE caches, see “-cw server option” [ASA
Database Administration Guide, page 139].
180
Chapter 5. Monitoring and Improving Performance
where dbsize is the total size of the database file or files started, and
physical-memory is 25% of the physical memory on the machine.
• Windows NT/2000/XP, Windows 95/98/Me, NetWare The formula is
as follows:
max( 2M, min( dbsize , physical-memory ) )
where dbsize is the total size of the database file or files started, and
physical-memory is 25% of the physical memory on the machine.
If an AWE cache is used on Windows 2000, Windows XP, or Windows
Server 2003 the formula is as follows:
min( 100% of available memory-128MB, dbsize )
181
You can also disable dynamic cache sizing by using the -ca command-line
option.
☞ For more information on command-line options, see “The database
server” [ASA Database Administration Guide, page 124].
182
Chapter 5. Monitoring and Improving Performance
the system require memory, the database server may move cache pages out
from memory to swap space.
Initial cache size By default, the initial cache size is assigned using an heuristic based on the
available system resources. The initial cache size is always less than 1.1
times the total database size.
If the initial cache size is greater than 3/4 of the available system resources,
the database server exits with a Not Enough Memory error.
You can change the initial cache size using the -c option.
Maximum cache size The maximum cache must be less than the available system resources on the
machine. By default, the maximum cache size is assigned using an heuristic
based on the available system resources and the total physical memory on
the machine. The cache size never exceeds the specified or implicit
maximum cache size, or the sum of the sizes of all open database and
temporary files plus the size of the main heap.
If you specify a maximum cache size greater than the available system
resources, the server exits with a Not Enough Memory error. If you specify
a maximum cache size greater than the available memory, the server warns
of performance degradation, but does not exit.
The database server allocates all the maximum cache size from the system
resources, and does not relinquish it until the server exits. You should be
sure that you choose a maximum cache size that gives good Adaptive Server
Anywhere performance while leaving space for other applications. The
formula for the default maximum cache size is an heuristic that attempts to
achieve this balance. You only need to tune the value if the default value is
not appropriate on your system.
If you specify a maximum cache size less than 8 Mb, you will not be able to
run Java applications. Low maximum cache sizes will impact performance.
☞ You can use the -ch server option to set the maximum cache size, and
limit automatic cache growth. For more information, see “-ch server option”
[ASA Database Administration Guide, page 136].
Minimum cache size If the -c option is specified, the minimum cache size is the same as the initial
cache size. If no -c option is specified, the minimum cache size on UNIX is
8 Mb.
You can use the -cl server option to adjust the minimum cache size.
☞ For more information, see “-cl server option” [ASA Database
Administration Guide, page 137].
183
Monitoring cache size
The following statistics have been added to the Windows Performance
Monitor and to the database’s property functions.
♦ CurrentCacheSize The current cache size in kilobytes
184
Chapter 5. Monitoring and Improving Performance
185
Using indexes to improve performance
Proper selection of indexes can make a large performance difference.
Creating and managing indexes is described in “Working with indexes” on
page 62.
186
Chapter 5. Monitoring and Improving Performance
The simplest way for the server to execute this query would be to look at all
75 rows in the employee table and check the employee ID number in each
row to see if it is 390. This does not take very long since there are only 75
employees, but for tables with many thousands of entries a sequential search
can take a long time.
The referential integrity constraints embodied by each primary or foreign
key are enforced by Adaptive Server Anywhere through the help of an index,
implicitly created with each primary or foreign key declaration. The emp_id
column is the primary key for the employee table. The corresponding
primary key index permits the retrieval of employee number 390 quickly.
This quick search takes almost the same amount of time whether there are
100 rows or 1,000,000 rows in the employee table.
Information on the Plan The Plan tab in the Results pane contains the following information:
tab
employee <employee>
Whenever the name inside the parentheses on the Plan tab PLAN description
is the same as the name of the table, it means that the primary key for the
table is used to improve performance.
187
SELECT *
FROM sales_order
WHERE cust_id = 113
Information on the Plan The Plan tab in the Results pane contains the following information:
tab
sales_order <ky_so_customer>
Here ky_so_customer refers to the foreign key that the sales_order table has
for the customer table.
188
Chapter 5. Monitoring and Improving Performance
Queries with WHERE A potential problem arises when a query has both a WHERE clause and an
and ORDER BY clauses ORDER BY clause.
SELECT *
FROM customer
WHERE id > 300
ORDER BY company_name
189
Use of work tables in query processing
Work tables are materialized temporary result sets that are created during the
execution of a query. Work tables are used when Adaptive Server Anywhere
determines that the cost of using one is less than alternative strategies.
Generally, the time to fetch the first few rows is higher when a work table is
used, but the cost of retrieving all rows may be substantially lower in some
cases if a work table can be used. Because of this difference, Adaptive
Server Anywhere chooses different strategies based on the
OPTIMIZATION_GOAL setting. The default is first-row. When it is set to
first-row, Adaptive Server Anywhere tries to avoid work tables. When it is
set to all-rows, Adaptive Server Anywhere uses work tables when they
reduce the total execution cost of a query.
Work tables are used in the following cases:
When work tables occur ♦ When a query has an ORDER BY, GROUP BY, OR DISTINCT clause
and Adaptive Server Anywhere does not use an index for sorting the
rows. If a suitable index exists and the OPTIMIZATION_GOAL setting
is first-row, Adaptive Server Anywhere avoids using a work table.
However, when OPTIMIZATION_GOAL is set to all-rows, it may be
more expensive to fetch all the rows of a query using an index than it is to
build a work table and sort the rows. Adaptive Server Anywhere chooses
the cheaper strategy if the optimization goal is set to all-rows. For
GROUP BY and DISTINCT, the hash-based algorithms use work tables,
but are generally more efficient when fetching all the rows out of a query.
♦ When a hash join algorithm is chosen, work tables are used to store
interim results (if the input doesn’t fit into memory) and a work table is
used to store the results of the join.
♦ When a cursor is opened with sensitive values, a work table is created to
hold the row identifiers and primary keys of the base tables. This work
table is filled in as rows are fetched from the query in the forward
direction. However, if you fetch the last row from the cursor, the entire
table is filled in.
190
Chapter 5. Monitoring and Improving Performance
191
Monitoring database performance
Adaptive Server Anywhere provides a set of statistics you can use to monitor
database performance. Accessible from Sybase Central, client applications
can access the statistics as functions. In addition, the server makes these
statistics available to the Windows Performance Monitor.
This section describes how to access performance and related statistics from
client applications, how to monitor database performance using Sybase
Central, how to monitor database performance using the Windows
Performance Monitor, and how to detect file, table, and index fragmentation.
Supply as an argument only the name of the property you wish to retrieve.
The functions return the value for the current server, connection, or database.
☞ For more information, see “PROPERTY function [System]” [ASA SQL
Reference, page 190], “CONNECTION_PROPERTY function [System]” [ASA
SQL Reference, page 111], and “DB_PROPERTY function [System]” [ASA SQL
Reference, page 130].
For a complete list of the properties available from the system functions, see
“System functions” [ASA SQL Reference, page 95].
Examples The following statement sets a variable named server_name to the name of
the current server:
SET server_name = property( ’name’ )
The following query returns the user ID for the current connection:
192
Chapter 5. Monitoring and Improving Performance
The following query returns the filename for the root file of the current
database:
SELECT db_property( ’file’ )
Improving query For better performance, a client application monitoring database activity
efficiency should use the property_number function to identify a named property, and
then use the number to repeatedly retrieve the statistic. The following set of
statements illustrates the process from Interactive SQL:
CREATE VARIABLE propnum INT ;
CREATE VARIABLE propval INT ;
SET propnum = property_number( ’cacheread’ );
SET propval = property( propnum )
Property names obtained in this way are available for many different
database statistics, from the number of transaction log page write operations
and the number of checkpoints carried out, to the number of reads of index
leaf pages from the memory cache.
You can view many of these statistics in graph form from the Sybase Central
database management tool.
193
Opening the Sybase Central Performance Monitor
You can display the Performance Monitor in the right pane of Sybase
Central when you have the Statistics folder open.
Note
The Performance Monitor only graphs statistics that you have added to it
ahead of time.
☞ See also
♦ “Adding and removing statistics” on page 194
♦ “Configuring the Sybase Central Performance Monitor” on page 195
194
Chapter 5. Monitoring and Improving Performance
Tip
You can also add a statistic to or remove one from the Performance Monitor
on the statistic’s property sheet.
☞ See also
♦ “Opening the Sybase Central Performance Monitor” on page 194
♦ “Configuring the Sybase Central Performance Monitor” on page 195
♦ “Monitoring database statistics from Windows Performance Monitor” on
page 195
195
The Windows monitor has two advantages:
♦ It offers more performance statistics (mainly those concerned with
network communications).
♦ Unlike the Sybase Central monitor, the Windows monitor is
non-intrusive. It uses a shared-memory scheme instead of performing
queries against the server, so it does not affect the statistics themselves.
☞ For a complete list of performance statistics you can monitor, see
“Performance Monitor statistics” [ASA Database Administration Guide,
page 656].
2. Choose Edit ➤ Add To Chart, or click the Plus sign button on the toolbar.
The Add To Chart dialog appears.
3. From the Object list, select one of the following:
♦ Adaptive Server Anywhere Connection To monitor performance
for a single connection. Choose a connection to monitor from the
displayed list.
♦ Adaptive Server Anywhere Database To monitor performance for a
single database. Choose a database to monitor from the displayed list.
♦ Adaptive Server Anywhere Engine To monitor performance on a
server-wide basis.
The Counter box displays a list of the statistics you can view.
196
Chapter 5. Monitoring and Improving Performance
For more information about the Windows Performance Monitor, see the
online help for the program.
197
Fragmentation
As you make changes to your database, the database file, tables, and indexes
can become fragmented. Fragmentation can decrease performance. Adaptive
Server Anywhere provides information that you can use to assess the level of
fragmentation in files, tables, and indexes.
This section describes how to detect fragmentation in files, tables, and
indexes, and how to defragment them.
File fragmentation
Performance can suffer if your database file is excessively fragmented. This
is disk fragmentation and it becomes more important as your database
increases in size.
The database server determines the number of file fragments in each dbspace
when you start a database on Windows NT/2000/XP. The server displays the
following information in the server message window when the number of
fragments is greater than one:
Database file "mydatabase.db" consists of nnn fragments
You can also obtain the number of database file fragments using the
DBFileFragments database property.
☞ For more information, see “Database-level properties” [ASA Database
Administration Guide, page 682].
Table fragmentation
When rows are not stored contiguously, or if rows are split onto more than
one page, performance decreases because these rows require additional page
accesses. Table fragmentation is distinct from file fragmentation.
Adaptive Server Anywhere reserves extra room on each page to allow rows
to grow slightly. When an update to a row causes it to grow beyond the
original space allocated for it, the row is split and the initial row location
contains a pointer to another page where the entire row is stored. For
example, filling empty rows with UPDATE statements or inserting new
columns into a table can lead to severe row splitting. As more rows are
198
Chapter 5. Monitoring and Improving Performance
Defragmenting tables The following procedures are useful when you detect that performance is
poor because a table is highly fragmented. Unloading and reloading the
database is more comprehensive in that it defragments all tables, including
system tables. To defragment particular tables or parts of tables, run
REORGANIZE TABLE. Reorganizing tables does not disrupt database
access.
199
❖ To defragment individual tables
1. Execute a REORGANIZE TABLE statement.
Index fragmentation
Indexes are designed to speed up searches on particular columns, but they
can become fragmented if many DELETEs are performed on the indexed
table. This may result in reduced performance if the index is accessed
frequently and the cache is not large enough to hold all of the index.
The sa_index_density stored procedure provides information about the
degree of fragmentation in a database’s indexes. You must have DBA
authority to run this procedure. The following statement calls the
sa_index_density stored procedure:
CALL sa_index_density ([’table_name’[,’owner_name’]])
200
Chapter 5. Monitoring and Improving Performance
trantest Function: Measures the load that can be handled by a given server
configuration given a database design and a set of transactions.
Location: SQL Anywhere 9\Samples\Asa\PerformanceTransaction
☞ For information about system procedures that measure query execution
times, see “sa_get_request_profile system procedure” [ASA SQL Reference,
page 760] and “sa_get_request_times system procedure” [ASA SQL Reference,
page 760].
201
Profiling database procedures
Procedure profiling shows you how long it takes your stored procedures,
functions, events, system triggers, and triggers to execute. You can also view
the execution time for each line of a procedure. Using the database profiling
information, you can determine which procedures can be fine-tuned to
increase performance within your database.
When profiling is enabled, Adaptive Server Anywhere monitors which
stored procedures, functions, events, system triggers, and triggers are used,
keeping track of how long it takes to execute them, and how many times
each one is called.
Profiling information is stored in memory by the server and can be viewed in
Sybase Central via the Profile tab or in Interactive SQL. Once profiling is
enabled, the database gathers profiling information until you disable
profiling or until the server is shut down.
☞ For more information about obtaining profiling information in
Interactive SQL, see “Viewing procedure profiling information in Interactive
SQL” on page 208.
202
Chapter 5. Monitoring and Improving Performance
203
❖ To disable profiling (Sybase Central)
1. Select the database in the left pane.
2. From the File menu, choose Properties.
The Database property sheet appears.
3. On the Profiling tab, clear the Enable Profiling on This Database option.
4. Click OK to close the property sheet.
Note You can also right click your database in Sybase Central to disable profiling.
From the popup menu, choose Profiling ➤ Stop Profiling.
204
Chapter 5. Monitoring and Improving Performance
205
❖ To view summary profiling information for events
1. Open the Events folder in the left pane.
A list of all the events in your database appears on the Events tab in the
right pane.
2. Click the Profile tab in the right pane.
Profiling information about all of the events within your database appears
on the Profile tab.
206
Chapter 5. Monitoring and Improving Performance
The procedure is broken down line by line and you can examine it to see
which lines have longer execution times and therefore might benefit from
changes to improve the procedure’s performance. You must be connected to
the database, have profiling enabled, and have DBA authority to access
procedure profiling information.
207
❖ To view profiling information for system triggers
1. Expand the database in the left pane.
2. Open the System Triggers folder in the left pane.
A list of all the system triggers appears on the System Triggers tab in the
right pane.
3. Select the system trigger you want to profile in the right pane.
4. Click the Profile tab in the right pane.
Profiling information about the specific system trigger appears on the
Profile tab in the right pane.
208
Chapter 5. Monitoring and Improving Performance
209
210
PART II
This part describes how to query and modify data. It includes several
chapters on queries, from simple to complex, as well as material on
inserting, deleting, and updating data.
CHAPTER 6
About this chapter The SELECT statement retrieves data from the database. You can use it to
retrieve a subset of the rows in one or more tables and to retrieve a subset of
the columns in one or more tables.
This chapter focuses on the basics of single-table SELECT statements.
Advanced uses of SELECT are described later in this manual.
Contents Topic: page
213
Query overview
A query requests data from the database and receives the results. This
process is also known as data retrieval. All SQL queries are expressed using
the SELECT statement.
♦ The ON clause specifies how tables in the FROM clause are to be joined.
It is used only for multi-table queries and is not discussed in this chapter.
☞ For information on multi-table queries, see “Joins: Retrieving Data
from Several Tables” on page 263.
♦ The WHERE clause specifies the rows in the tables you want to see.
♦ The GROUP BY clause allows you to aggregate data.
♦ The HAVING clause specifies rows on which aggregate data is to be
collected.
♦ The ORDER BY clause sorts the rows in the result set. (By default, rows
are returned from relational databases in an order that has no meaning.)
214
Chapter 6. Queries: Selecting Data from a Table
Most of the clauses are optional, but if they are included then they must
appear in the correct order.
☞ For more information about the SELECT statement syntax, see
“SELECT statement” [ASA SQL Reference, page 575].
SQL queries
In this manual, SELECT statements and other SQL statements appear with
each clause on a separate row, and with the SQL keywords in upper case.
This is not a requirement. You can type SQL keywords in any case, and you
can break lines at any point.
Keywords and line For example, the following SELECT statement finds the first and last names
breaks of contacts living in California from the Contact table.
SELECT first_name, last_name
FROM Contact
WHERE state = ’CA’
Case sensitivity of Identifiers (that is, table names, column names, and so on) are case
strings and identifiers insensitive in Adaptive Server Anywhere databases.
Strings are case insensitive by default, so that ‘CA’, ‘ca’, ‘cA’, and ‘Ca’ are
equivalent, but if you create a database as case-sensitive then the case of
strings is significant. The sample database is case insensitive.
Qualifying identifiers You can qualify the names of database identifiers if there is ambiguity about
which object is being referred to. For example, the sample database contains
several tables with a column called city, so you may have to qualify
references to city with the name of the table. In a larger database you may
also have to use the name of the owner of the table to identify the table.
SELECT DBA.contact.city
FROM contact
WHERE state = ’CA’
215
These elements are left out for readability; it is never wrong to include
qualifiers.
The remaining sections in this chapter analyze the syntax of the SELECT
statement in more detail.
216
Chapter 6. Queries: Selecting Data from a Table
If any table or column name in the list does not conform to the rules for valid
identifiers, you must enclose the identifier in double quotes.
The select list expressions can include * (all columns), a list of column
names, character strings, column headings, and expressions including
arithmetic operators. You can also include aggregate functions, which are
discussed in “Summarizing, Grouping, and Sorting Query Results” on
page 237.
☞ For more information about what expressions can consist of, see
“Expressions” [ASA SQL Reference, page 16].
The following sections provide examples of the kinds of expressions you can
use in a select list.
SELECT * finds all the columns currently in a table, so that changes in the
structure of a table such as adding, removing, or renaming columns
automatically modify the results of SELECT *. Listing the columns
individually gives you more precise control over the results.
Example The following statement retrieves all columns in the department table. No
WHERE clause is included; and so this statement retrieves every row in the
table:
217
SELECT *
FROM department
Like a column name, “*” can be qualified with a table name, as in the
following query:
SELECT department.*
FROM department
Rearranging the order of The order in which you list the column names determines the order in which
columns the columns are displayed. The two following examples show how to
specify column order in a display. Both of them find and display the
department names and identification numbers from all five of the rows in the
department table, but in a different order.
SELECT dept_id, dept_name
FROM department
218
Chapter 6. Queries: Selecting Data from a Table
dept_id dept_name
100 R&D
200 Sales
300 Finance
400 Marketing
... ...
SELECT dept_name, dept_id
FROM department
dept_name dept_id
R&D 100
Sales 200
Finance 300
Marketing 400
... ...
Providing an alias can produce more readable results. For example, you can
change dept_name to Department in a listing of departments as follows:
SELECT dept_name AS Department,
dept_id AS "Identifying Number"
FROM department
219
Department Identifying Number
R&D 100
Sales 200
Finance 300
Marketing 400
... ...
Using spaces and The Identifying Number alias for dept_id is enclosed in double quotes
keywords in alias because it is an identifier. You also use double quotes if you wish to use
keywords in aliases. For example, the following query is invalid without the
quotation marks:
SELECT dept_name AS Department,
dept_id AS "integer"
FROM department
Prefix Department
220
Chapter 6. Queries: Selecting Data from a Table
Tee Shirt 28 9
Tee Shirt 54 14
Tee Shirt 75 14
Tee Shirt 18
Tee Shirt 44
Tee Shirt 65
... ...
You can also combine the values in columns. The following query lists the
total value of each product in stock:
SELECT name,
quantity * unit_price AS "Inventory value"
FROM product
221
name Inventory value
222
Chapter 6. Queries: Selecting Data from a Table
emp_id Name
You can eliminate the duplicate entries using DISTINCT. The following
query returns only 16 rows.:
SELECT DISTINCT city
FROM contact
NULL values are not The DISTINCT keyword treats NULL values as duplicates of each other. In
distinct other words, when DISTINCT is included in a SELECT statement, only one
NULL is returned in the results, no matter how many NULL values are
encountered.
223
The FROM clause: specifying tables
The FROM clause is required in every SELECT statement involving data
from tables, views, or stored procedures.
☞ The FROM clause can include JOIN conditions linking two or more
tables, and can include joins to other queries (derived tables). For
information on these features, see “Joins: Retrieving Data from Several
Tables” on page 263.
Qualifying table names In the FROM clause, the full naming syntax for tables and views is always
permitted, such as:
SELECT select-list
FROM owner.table_name
Qualifying table, view, and procedure names is necessary only when the
object is owned by a user ID that is not the same as the user ID of the current
connection, or is not the name of a group to which the user iD Of the current
connection belongs.
Using correlation names You can give a table name a correlation name to save typing. You assign the
correlation name in the FROM clause by typing it after the table name, like
this:
SELECT d.dept_id, d.dept_name
FROM Department d
Querying from objects The most common elements in a FROM clause are table names. It is also
other than tables possible to query rows from other database objects that have a table-like
structure—that is, a well-defined set of rows and columns. In addition to
querying from tables and views, you can use derived tables (which are
SELECT statements) or stored procedures that return result sets.
For example, the following query operates on the result set of a stored
procedure.
SELECT *
FROM sp_customer_products( 149 )
☞ For more information, see “FROM clause” [ASA SQL Reference, page 469].
224
Chapter 6. Queries: Selecting Data from a Table
♦ Ranges (BETWEEN and NOT BETWEEN) For example, you can list
all employees earning between $40,000 and $60,000:
SELECT emp_lname
FROM employee
WHERE salary BETWEEN 40000 AND 60000
♦ Lists (IN, NOT IN) For example, you can list all customers in Ontario,
Quebec, or Manitoba:
SELECT company_name , state
FROM customer
WHERE state IN( ’ON’, ’PQ’, ’MB’)
♦ Character matches (LIKE and NOT LIKE) For example, you can list
all customers whose phone numbers start with 415. (The phone number
is stored as a string in the database):
SELECT company_name , phone
FROM customer
WHERE phone LIKE ’415%’
♦ Unknown values (IS NULL and IS NOT NULL) For example, you can
list all departments with managers:
SELECT dept_name
FROM Department
WHERE dept_head_id IS NOT NULL
♦ Combinations (AND, OR) For example, you can list all employees
earning over $50,000 whose first name begins with the letter A.
225
SELECT emp_fname, emp_lname
FROM employee
WHERE salary > 50000
AND emp_fname like ’A%’
You can also find the collation from Sybase Central. It is on the Extended
Information tab of the database property sheet.
♦ Trailing blanks When you create a database, you indicate whether
trailing blanks are to be ignored or not for the purposes of comparison.
By default, databases are created with trailing blanks not ignored. For
example, ‘Dirk’ is not the same as ‘Dirk ‘. You can create databases with
blank padding, so that trailing blanks are ignored. Trailing blanks are
ignored by default in Adaptive Server Enterprise databases.
♦ Comparing dates In comparing dates, < means earlier and > means
later.
♦ Case sensitivity When you create a database, you indicate whether
string comparisons are case sensitive or not.
226
Chapter 6. Queries: Selecting Data from a Table
The NOT operator The NOT operator negates an expression. Either of the following two
queries will find all Tee shirts and baseball caps that cost $10 or less.
However, note the difference in position between the negative logical
operator (NOT) and the negative comparison operator (!>).
SELECT id, name, quantity
FROM product
WHERE (name = ’Tee Shirt’ OR name = ’BaseBall Cap’)
AND NOT unit_price > 10
SELECT id, name, quantity
FROM product
WHERE (name = ’Tee Shirt’ OR name = ’BaseBall Cap’)
AND unit_price !> 10
❖ To list all the products with prices between $10 and $15, inclusive
1. Type the following query:
SELECT name, unit_price
FROM product
WHERE unit_price BETWEEN 10 AND 15
name unit_price
Tee Shirt 14
Tee Shirt 14
Baseball Cap 10
Shorts 15
227
You can use NOT BETWEEN to find all the rows that are not inside the
range.
❖ To list all the products cheaper than $10 or more expensive than
$15
1. Execute the following query:
SELECT name, unit_price
FROM product
WHERE unit_price NOT BETWEEN 10 AND 15
name unit_price
Tee Shirt 9
Baseball Cap 9
Visor 7
Visor 7
... ...
However, you get the same results if you use IN. The items following the IN
keyword must be separated by commas and enclosed in parentheses. Put
single quotes around character, date, or time values. For example:
SELECT company_name , state
FROM customer
WHERE state IN( ’ON’, ’MB’, ’PQ’)
Perhaps the most important use for the IN keyword is in nested queries, also
called subqueries.
228
Chapter 6. Queries: Selecting Data from a Table
matching pattern. LIKE is used with character, binary, or date and time data.
The syntax for LIKE is:
{ WHERE | HAVING } expression [ NOT ] LIKE match-expression
Symbols Meaning
[specifier] The specifier in the brackets may take the following forms:
Note that the range [a-f], and the sets [abcdef] and [fcbdae]
return the same set of values.
You can match the column data to constants, variables, or other columns that
contain the wildcard characters displayed in the table. When using constants,
you should enclose the match strings and character strings in single quotes.
Examples All the following examples use LIKE with the last_name column in the
Contact table. Queries are of the form:
SELECT last_name
FROM contact
WHERE last_name LIKE match-expression
229
Match ex- Description Returns
pression
‘Mc%’ Search for every name that begins with the McEvoy
letters Mc
‘%er’ Search for every name that ends with er Brier, Miller,
Weaver, Rayner
‘%en%’ Search for every name containing the Pettengill,
letters en. Lencki, Cohen
‘_ish’ Search for every four-letter name ending Fish
in ish.
‘Br[iy][ae]r’ Search for Brier, Bryer, Briar, or Bryar. Brier
Using LIKE with date and You can use LIKE on date and time fields as well as on character data. When
time values you use LIKE with date and time values, the dates are converted to the
standard DATETIME format, and then to VARCHAR.
One feature of using LIKE when searching for DATETIME values is that,
since date and time entries may contain a variety of date parts, an equality
test has to be written carefully in order to succeed.
For example, if you insert the value 9:20 and the current date into a column
named arrival_time, the clause:
WHERE arrival_time = ’9:20’
fails to find the value, because the entry holds the date as well as the time.
However, the clause below would find the 9:20 value:
WHERE arrival_time LIKE ’%09:20%’
230
Chapter 6. Queries: Selecting Data from a Table
Using NOT LIKE With NOT LIKE, you can use the same wildcard characters that you can use
with LIKE. To find all the phone numbers in the Contact table that do not
have 415 as the area code, you can use either of these queries:
SELECT phone
FROM Contact
WHERE phone NOT LIKE ’415%’
SELECT phone
FROM Contact
WHERE NOT phone LIKE ’415%’
231
enclose a quotation in the other kind of quotation mark. In other words,
surround an entry containing double quotation marks with single quotation
marks, or vice versa. Here are some examples:
’George said, "There must be a better way."’
"Isn’t there a better way?"
’George asked, "Isn’’t there a better way?"’
♦ Explicit entry You can explicitly enter the value NULL by typing the
word NULL (without quotation marks).
If the word NULL is typed in a character column with quotation marks, it
is treated as data, not as a null value.
For example, the dept_head_id column of the department table allows nulls.
You can enter two rows for departments with no manager as follows:
INSERT INTO department (dept_id, dept_name)
VALUES (201, ’Eastern Sales’)
INSERT INTO department
VALUES (202, ’Western Sales’, null)
When NULLs are When NULLS are retrieved, displays of query results in Interactive SQL
retrieved show (NULL) in the appropriate position:
SELECT *
FROM department
232
Chapter 6. Queries: Selecting Data from a Table
This logic also applies when you use two column names in a WHERE
clause, that is, when you join two tables. A clause containing the condition
WHERE column1 = column2
233
WHERE column_name IS [NOT] NULL
For example:
WHERE advance < $5000
OR advance IS NULL
☞ For more information, see “NULL value” [ASA SQL Reference, page 49].
Properties of NULL
evaluates to true. But “not unknown” is still unknown. If null values are
included in a comparison, you cannot negate the expression to get the
opposite set of rows or the opposite truth value.
♦ Substituting a value for NULLs Use the ISNULL built-in function to
substitute a particular value for nulls. The substitution is made only for
display purposes; actual column values are not affected. The syntax is:
ISNULL( expression, value )
For example, use the following statement to select all the rows from test,
and display all the null values in column t1 with the value unknown.
SELECT ISNULL(t1, ’unknown’)
FROM test
234
Chapter 6. Queries: Selecting Data from a Table
Using OR The OR operator also connects two or more conditions, but it returns results
when any of the conditions is true. The following query searches for rows
containing variants of Elizabeth in the first_name column.
SELECT *
FROM contact
WHERE first_name = ’Beth’
OR first_name = ’Liz’
Using NOT The NOT operator negates the expression that follows it. The following
query lists all the contacts who do not live in California:
SELECT *
FROM contact
WHERE NOT state = ’CA’
When more than one logical operator is used in a statement, AND operators
are normally evaluated before OR operators. You can change the order of
execution with parentheses. For example:
SELECT *
FROM contact
WHERE ( city = ’Lexington’
OR city = ’Burlington’ )
AND state = ’MA’
235
CHAPTER 7
About this chapter Aggregate functions display summaries of the values in specified columns.
You can also use the GROUP BY clause, HAVING clause, and ORDER BY
clause to group and sort the results of queries using aggregate functions, and
the UNION operator to combine the results of queries.
This chapter describes how to group and sort query results.
Contents Topic: page
237
Summarizing query results using aggregate
functions
You can apply aggregate functions to all the rows in a table, to a subset of
the table specified by a WHERE clause, or to one or more groups of rows in
the table. From each set of rows to which an aggregate function is applied,
Adaptive Server Anywhere generates a single value.
The following are among the available aggregate functions:
♦ avg( expression ) The mean of the supplied expression over the
returned rows.
♦ count( expression ) The number of rows in the supplied group where
the expression is not NULL.
♦ count(*) The number of rows in each group.
♦ list( string-expr) A string containing a comma-separated list composed
of all the values for string-expr in each group of rows.
♦ max( expression ) The maximum value of the expression, over the
returned rows.
♦ min( expression ) The minimum value of the expression, over the
returned rows.
♦ stddev( expression ) The standard deviation of the expression, over the
returned rows.
♦ sum( expression ) The sum of the expression, over the returned rows.
♦ variance( expression ) The variance of the expression, over the
returned rows.
☞ For a complete list of aggregate functions, see “Aggregate functions”
[ASA SQL Reference, page 86].
You can use the optional keyword DISTINCT with AVG, SUM, LIST, and
COUNT to eliminate duplicate values before the aggregate function is
applied.
The expression to which the syntax statement refers is usually a column
name. It can also be a more general expression.
For example, with this statement you can find what the average price of all
products would be if one dollar were added to each price:
SELECT AVG (unit_price + 1)
FROM product
238
Chapter 7. Summarizing, Grouping, and Sorting Query Results
Example The following query calculates the total payroll from the annual salaries in
the employee table:
SELECT SUM(salary)
FROM employee
To use aggregate functions, you must give the function name followed by an
expression on whose values it will operate. The expression, which is the
salary column in this example, is the function’s argument and must be
specified inside parentheses.
239
The following restrictions now apply to the use of outer reference aggregate
functions in subqueries:
♦ The outer reference aggregate function can only appear in subqueries that
are in the SELECT list or HAVING clause, and these clauses must be in
the immediate outer block.
♦ Outer reference aggregate functions can only contain one outer column
reference.
name sum(p.quantity)
Since the outer block now computes an aggregate function, the outer block is
240
Chapter 7. Summarizing, Grouping, and Sorting Query Results
Using COUNT(*)
The COUNT(*) function does not require an expression as an argument
because, by definition, it does not use information about any particular
column. The COUNT(*) function finds the total number of rows in a table.
This statement finds the total number of employees:
SELECT COUNT(*)
FROM employee
count(*) AVG(product.unit_price)
5 18.2
241
SELECT count(DISTINCT city)
FROM contact
count(distinct contact.city)
16
You can use more than one aggregate function with DISTINCT in a query.
Each DISTINCT is evaluated independently. For example:
SELECT count( DISTINCT first_name ) "first names",
count( DISTINCT last_name ) "last names"
FROM contact
48 60
count(DISTINCT name)
0
SELECT AVG(unit_price)
FROM product
WHERE unit_price > 50
AVG(product.unit_price)
( NULL )
242
Chapter 7. Summarizing, Grouping, and Sorting Query Results
name Price
Sweatshirt 24
... ...
The summary values (vector aggregates) produced by SELECT statements
with aggregates and a GROUP BY appear as columns in each row of the
results. By contrast, the summary values (scalar aggregates) produced by
queries with aggregates and no GROUP BY also appear as columns, but
with only one row. For example:
SELECT AVG(unit_price)
FROM product
AVG(product.unit_price)
13.3
243
Understanding GROUP BY
Understanding which queries are valid and which are not can be difficult
when the query involves a GROUP BY clause. This section describes a way
to think about queries with GROUP BY so that you may understand the
results and the validity of queries better.
WHERE
clause
244
Chapter 7. Summarizing, Grouping, and Sorting Query Results
Second
Intermediate
intermediate
result
result
GROUP BY
clause
5. Project out the results to display This action takes from step 3 only
those columns that need to be displayed in the result set of the query–that
is, it takes only those columns corresponding to the expressions from the
select-list.
Second
intermediate Final
result result
Projection
245
GROUP BY with multiple columns
You can list more than one expression in the GROUP BY clause in order to
nest groups—that is, you can group a table by any combination of
expressions.
The following query lists the average price of products, grouped first by
name and then by size:
SELECT name, size, AVG(unit_price)
FROM product
GROUP BY name, size
Only the rows with id values of more than 400 are included in the groups
that are used to produce the query results.
Example The following query illustrates the use of WHERE, GROUP BY, and
246
Chapter 7. Summarizing, Grouping, and Sorting Query Results
name SUM(product.quantity)
247
The HAVING clause: selecting groups of data
The HAVING clause restricts the rows returned by a query. It sets conditions
for the GROUP BY clause similar to the way in which WHERE sets
conditions for the SELECT clause.
The HAVING clause search conditions are identical to WHERE search
conditions except that WHERE search conditions cannot include aggregates,
while HAVING search conditions often do. The example below is legal:
HAVING AVG(unit_price) > 20
Using HAVING with The following statement is an example of simple use of the HAVING clause
aggregate functions with an aggregate function.
To list those products available in more than one size or color, you need a
query to group the rows in the product table by name, but eliminate the
groups that include only one distinct product:
SELECT name
FROM product
GROUP BY name
HAVING COUNT(*) > 1
name
Tee Shirt
Baseball Cap
Visor
Sweatshirt
☞ For information about when you can use aggregate functions in
HAVING clauses, see “Where you can use aggregate functions” on
page 239.
Using HAVING without The HAVING clause can also be used without aggregates.
aggregate functions
The following query groups the products, and then restricts the result set to
only those groups for which the name starts with B.
SELECT name
FROM product
GROUP BY name
HAVING name LIKE ’B%’
248
Chapter 7. Summarizing, Grouping, and Sorting Query Results
name
Baseball Cap
More than one condition More than one condition can be included in the HAVING clause. They are
in HAVING combined with the AND, OR, or NOT operators, as in the following
example.
To list those products available in more than one size or color, for which one
version costs more than $10, you need a query to group the rows in the
product table by name, but eliminate the groups that include only one
distinct product, and eliminate those groups for which the maximum unit
price is under $10.
SELECT name
FROM product
GROUP BY name
HAVING COUNT(*) > 1
AND MAX(unit_price) > 10
name
Tee Shirt
Sweatshirt
249
The ORDER BY clause: sorting query results
The ORDER BY clause allows sorting of query results by one or more
columns. Each sort can be ascending (ASC) or descending (DESC). If
neither is specified, ASC is assumed.
A simple example The following query returns results ordered by name:
SELECT id, name
FROM product
ORDER BY name
id name
700 Shorts
600 Sweatshirt
... ...
Sorting by more than one If you name more than one column in the ORDER BY clause, the sorts are
column nested.
The following statement sorts the shirts in the product table first by name in
ascending order, then by quantity (descending) within each name:
SELECT id, name, quantity
FROM product
WHERE name like ’%shirt%’
ORDER BY name, quantity DESC
id name quantity
600 Sweatshirt 39
601 Sweatshirt 32
302 Tee Shirt 75
250
Chapter 7. Summarizing, Grouping, and Sorting Query Results
Most versions of SQL require that ORDER BY items appear in the select
list, but Adaptive Server Anywhere has no such restriction. The following
query orders the results by quantity, although that column does not appear in
the select list:
SELECT id, name
FROM product
WHERE name like ’%shirt%’
ORDER BY 2, quantity DESC
ORDER BY and NULL With ORDER BY, NULL sorts before all other values in ascending sort
order.
ORDER BY and case The effects of an ORDER BY clause on mixed-case data depend on the
sensitivity database collation and case sensitivity specified when the database is created.
The following query returns the first five employees as sorted by last name:
SELECT TOP 5 *
FROM employee
ORDER BY emp_lname
When you use TOP, you can also use START AT to provide an offset. The
following statement lists the fifth and sixth employees sorted in descending
order by last name:
SELECT TOP 2 START AT 5 *
FROM employee
ORDER BY emp_lname DESC
251
FIRST and TOP should be used only in conjunction with an ORDER BY
clause to ensure consistent results. Use of FIRST or TOP without an
ORDER BY triggers a syntax warning, and will likely yield unpredictable
results.
Note
The ‘start at’ value must be greater than 0. The ‘top’ value must be greater
than 0 when a constant and greater or equal to 0 when a variable.
name AVG(product.unit_price)
Visor 7
Shorts 15
... ...
252
Chapter 7. Summarizing, Grouping, and Sorting Query Results
253
and z. In the UNION between that set and x, duplicates are not eliminated.
In the second expression, duplicates are included in the union between x and
y, but are then eliminated in the subsequent union with z.
The INTERSECT operation lists the rows that appear in each of two result
sets. The following general construction lists all those rows that appear in
the result set of both query-1 and query-2.
query-1
INTERSECT
query-2
Like the UNION operation, both EXCEPT and INTERSECT take the ALL
modifier, which prevents the elimination of duplicate rows from the result
set.
For more information, see “EXCEPT operation” [ASA SQL Reference,
page 447], and “INTERSECT operation” [ASA SQL Reference, page 512].
254
Chapter 7. Summarizing, Grouping, and Sorting Query Results
Alternatively, you can use the WITH clause to define the column names.
For example:
255
WITH V( Cities )
AS ( SELECT city
FROM contact
UNION
SELECT city
FROM customer )
SELECT * FROM V
♦ Ordering the results You can use the WITH clause of the SELECT
statement to order the column names in the select list . For example:
WITH V( cityname )
AS ( SELECT Cities = city
FROM contact
UNION
SELECT city
FROM customer )
SELECT * FROM V
ORDER BY cityname
Alternatively, you can use a single ORDER BY clause at the end of the
list of queries, but you must use integers rather than column names, as in
the following example:
SELECT Cities = city
FROM contact
UNION
SELECT city
FROM customer
ORDER BY 1
256
Chapter 7. Summarizing, Grouping, and Sorting Query Results
1 a
2 b
3 (NULL)
3 (NULL)
4 (NULL)
4 (NULL)
♦ Table T2
col1 col2
1 a
2 x
3 (NULL)
One query that asks for rows in T1 that also appear in T2 is as follows:
SELECT T1.col1, T1.col2
FROM T1 JOIN T2
ON T1.col1 = T2.col2
AND T1.col2 = T2.col2
T1.col1 T1.col2
1 a
The row ( 3, NULL ) does not appear in the result set, as the comparison
between NULL and NULL is not true. In contrast, approaching the problem
using the INTERSECT operator includes a row with NULL:
SELECT col1, col2
FROM T1
INTERSECT
SELECT col1, col2
FROM T2
col1 col2
1 a
3 (NULL)
257
The following query uses search conditions to list rows in T1 that do not
appear in T2:
SELECT col1, col2
FROM T1
WHERE col1 NOT IN (
SELECT col1
FROM t2
WHERE t1.col2 = t2.col2 )
OR col2 NOT IN (
SELECT col2
FROM t2
WHERE t1.col1 = t2.col1 )
col1 col2
2 b
3 (NULL)
4 (NULL)
3 (NULL)
4 (NULL)
The NULL-containing rows from T1 are not excluded by the comparison. In
contrast, approaching the problem using EXCEPT ALL excludes
NULL-containing rows that appear in both tables. In this case, the (3,
NULL) row in T2 is identified as the same as the (3, NULL) row in T1.
SELECT col1, col2
FROM T1
EXCEPT ALL
SELECT col1, col2
FROM T2
col1 col2
2 b
3 (NULL)
4 (NULL)
4 (NULL)
The EXCEPT operator is more restrictive still. It eliminates both (3, NULL)
rows from T1 and excludes one of the (4, NULL) rows as a duplicate.
258
Chapter 7. Summarizing, Grouping, and Sorting Query Results
col1 col2
2 b
4 (NULL)
259
Standards and compatibility
This section describes standards and compatibility aspects of the Adaptive
Server Anywhere GROUP BY clause.
♦ GROUP BY and ALL Adaptive Server Anywhere does not support the
use of ALL in the GROUP BY clause.
260
Chapter 7. Summarizing, Grouping, and Sorting Query Results
261
CHAPTER 8
About this chapter When you create a database, you normalize the data by placing information
specific to different objects in different tables, rather than in one large table
with many redundant entries.
A join operation recreates a larger table using the information from two or
more tables (or views). Using different joins, you can construct a variety of
these virtual tables, each suited to a particular task.
Before you start This chapter assumes some knowledge of queries and the syntax of the
select statement. Information about queries appears in “Queries: Selecting
Data from a Table” on page 213.
Contents Topic: page
263
Sample database schema
This chapter makes frequent reference to the sample database. In the
following diagram, the sample database is shown with the names of the
foreign keys that relate the tables. The sample database is held in a file
called asademo.db, and is located in your installation directory.
asademo.db
product employee
id <pk> integer sales_order_items
emp_id <pk> integer
name char(15) id <pk,fk> integer
manager_id integer
description char(30) line_id <pk> smallint
emp_fname char(20)
size char(18) prod_id <fk> integer
id = prod_id emp_lname char(20)
color char(6) quantity integer
(ky_prod_id) dept_id <fk> integer
quantity integer ship_date date
street char(40)
unit_price numeric(15,2)
city char(20)
id = id emp_id = sales_rep state char(4)
(id_fk) (ky_so_employee_id) zip_code char(9)
phone char(10)
customer status char(1)
ss_number char(11)
id <pk> integer sales_order
salary numeric(20,3)
fname char(15) id <pk> integer start_date date
lname char(20) cust_id <fk> integer termination_date date
address char(35) order_date date birth_date date
city char(20) id = cust_id fin_code_id <fk> char(2) bene_health_ins char(1)
state char(2) (ky_so_customer) region char(7) bene_life_ins char(1)
zip char(10) sales_rep <fk> integer bene_day_care char(1)
phone char(12)
sex char(1)
company_name char(35) code = fin_code_id
(ky_so_fincode)
fin_code
dept_id = dept_id
code <pk> char(2) (ky_dept_id)
contact
type char(10)
id <pk> integer description char(50)
last_name char(15) emp_id = dept_head_id
first_name char(15) code = code (ky_dept_head)
title char(2) (ky_code_data)
street char(30)
city char(20) fin_data
state char(2) year <pk> char(4) department
zip char(5) quarter <pk> char(2) dept_id <pk> integer
phone char(10) code <pk,fk> char(2) dept_name char(40)
fax char(10) amount numeric(9) dept_head_id <fk> integer
264
Chapter 8. Joins: Retrieving Data from Several Tables
265
Joins overview
A join is an operation that combines the rows in tables by comparing the
values in specified columns. This section is an overview of Adaptive Server
Anywhere join syntax. All of the concepts are explored in greater detail in
later sections.
table or view:
[userid.] table-or-view-name [ [ AS ] correlation-name ]
derived table:
( select-statement ) [ AS ] correlation-name [ ( column-name, . . . ) ]
joined table:
table_expression join_operator table_expression [ ON join_condition ]
join_operator :
[ KEY | NATURAL ] [ join_type ] JOIN | CROSS JOIN
join_type:
INNER | FULL [ OUTER ] | LEFT [ OUTER ] | RIGHT [ OUTER ]
☞ For the syntax of the ON phrase, see “Search conditions” [ASA SQL
Reference, page 23].
Join conditions
Tables can be joined using join conditions. A join condition is simply a
search condition. It chooses a subset of rows from the joined tables based on
the relationship between values in the columns. For example, the following
query retrieves data from the product and sales_order_items tables.
266
Chapter 8. Joins: Retrieving Data from Several Tables
SELECT *
FROM product JOIN sales_order_items
ON product.id = sales_order_items.prod_id
This join condition means that rows can be combined in the result set only if
they have the same product ID in both tables.
Join conditions can be explicit or generated. An explicit join condition is a
join condition that is put in an ON phrase or a WHERE clause. The
following query uses an ON phrase. It produces a cross product of the two
tables (all combinations of rows), but with rows excluded if the id numbers
do not match. The result is a list of customers with details of their orders.
SELECT *
FROM customer JOIN sales_order
ON sales_order.cust_id = customer.id
Tip: Both key join syntax and natural join syntax are shortcuts: you get
identical results from using the keyword JOIN without KEY or NATURAL,
and then explicitly stating the same join condition in an ON phrase.
When you use an ON phrase with a key join or natural join, the join
condition that is used is the conjunction of the explicitly specified join
condition with the generated join condition. This means that the join
conditions are combined with the keyword AND.
Joined tables
Adaptive Server Anywhere supports the following classes of joined tables.
♦ CROSS JOIN A cross join of two tables produces all possible
combinations of rows from the two tables. The size of the result set is the
number of rows in the first table multiplied by the number of rows in the
second table. A cross join is also called a cross product or Cartesian
product. You cannot use an ON phrase with a cross join.
♦ KEY JOIN (default) A join condition is automatically generated
based on the foreign key relationships that have been built into the
267
database. Key join is the default when the JOIN keyword is used without
specifying a join type and there is no ON phrase.
268
Chapter 8. Joins: Retrieving Data from Several Tables
♦ Exclude all rows where the product IDs are not identical (because of the
join condition product.id = sales_order_items.prod_id).
♦ Exclude all rows where the quantity is not identical (because of the join
condition product.quantity =
sales_order_items.quantity).
is equivalent to
SELECT *
FROM ((A JOIN B) JOIN C) JOIN D
Whenever more than two tables are joined, the join involves table
expressions. In the example A JOIN B JOIN C, the table expression A
JOIN B is joined to C. This means, conceptually, that A and B are joined,
and then the result is joined to C.
The order of joins is important if the table expression contains outer joins.
For example, A JOIN B LEFT OUTER JOIN C is interpreted as (A JOIN
B) LEFT OUTER JOIN C. This means that the table expression A JOIN B
is joined to C. The table expression A JOIN B is preserved and table C is
null-supplying.
☞ For more information about outer joins, see “Outer joins” on page 276.
269
☞ For more information about how Adaptive Server Anywhere performs a
key join of table expressions, see “Key joins of table expressions” on
page 297.
☞ For more information about how Adaptive Server Anywhere performs a
natural join of table expressions, see “Natural joins of table expressions” on
page 291.
Non-ANSI joins
☞ Adaptive Server Anywhere supports ISO/ANSI standards for joins. It
also supports the following non-standard joins:
♦ “Transact-SQL outer joins (*= or =*)” on page 280
☞ You can use the REWRITE function to see the ANSI equivalent of a
non-ANSI join.
☞ For more information, see “REWRITE function [Miscellaneous]” [ASA
SQL Reference, page 204].
270
Chapter 8. Joins: Retrieving Data from Several Tables
Tables that can be The tables that are referenced in an ON phrase must be part of the join that
referenced the ON phrase modifies. For example, the following is invalid:
FROM (A KEY JOIN B) JOIN (C JOIN D ON A.x = C.x)
The problem is that the join condition A.x = C.x references table A, which
is not part of the join it modifies (in this case, C JOIN D).
However, as of the ANSI/ISO standard SQL99 and Adaptive Server
Anywhere 7.0, there is an exception to this rule: if you use commas between
table expressions, an ON condition of a join can reference a table that
precedes it syntactically in the FROM clause. Therefore, the following is
valid:
FROM (A KEY JOIN B) , (C JOIN D ON A.x = C.x)
The following is a join between table A and table B with the join condition
271
A.x = B.y. It is not a key join.
SELECT *
FROM A JOIN B ON A.x = B.y
If you specify a KEY JOIN or NATURAL JOIN and use an ON phrase, the
final join condition is the conjunction of the generated join condition and the
explicit join condition(s). For example, the following statement has two join
conditions: one generated because of the key join, and one explicitly stated
in the ON phrase.
SELECT *
FROM A KEY JOIN B ON A.x = B.y
If the join condition generated by the key join is A.w = B.z, then the
following statement is equivalent:
SELECT *
FROM A JOIN B
ON A.x = B.y
AND A.w = B.z
☞ For more information about key joins, see “Key joins” on page 294.
Types of explicit join conditions
Most join conditions are based on equality, and so are called equijoins. For
example,
SELECT *
FROM department JOIN employee
ON department.dept_id = employee.dept_id
However, you do not have to use equality (=) in a join condition. You can
use any search condition, such as conditions containing LIKE, SOUNDEX,
BETWEEN, > (greater than), and != (not equal to).
Example The following example answers the question: For which products has
someone ordered more than the quantity in stock?
SELECT DISTINCT product.name
FROM product JOIN sales_order_items
ON product.id = sales_order_items.prod_id
AND product.quantity > sales_order_items.quantity
For more information about search conditions, see “Search conditions” [ASA
SQL Reference, page 23].
272
Chapter 8. Joins: Retrieving Data from Several Tables
273
Cross joins
A cross join of two tables produces all possible combinations of rows from
the two tables. A cross join is also called a cross product or Cartesian
product.
Each row of the first table appears once with each row of the second table.
Hence, the number of rows in the result set is the product of the number of
rows in the first table and the number of rows in the second table, minus any
rows that are omitted because of restrictions in a WHERE clause.
You cannot use an ON phrase with cross joins. However, you can put
restrictions in a WHERE clause.
Inner and outer modifiers Except in the presence of additional restrictions in the WHERE clause, all
do not apply to cross rows of both tables always appear in the result set of cross joins. Thus, the
joins keywords INNER, LEFT OUTER and RIGHT OUTER are not applicable to
cross joins.
For example, the following statement joins two tables.
SELECT *
FROM A CROSS JOIN B
The result set from this query includes all columns in A and all columns in
B. There is one row in the result set for each combination of a row in A and
a row in B. If A has n rows and B has m rows, the query returns n x m rows.
Commas
A comma works like a join operator, but is not one. A comma creates a cross
product exactly as the keyword CROSS JOIN does. However, join keywords
create table expressions, and commas create lists of table expressions.
In the following simple inner join of two tables, a comma and the keywords
CROSS JOIN are equivalent:
Select *
FROM A CROSS JOIN B CROSS JOIN C
WHERE A.x = B.y
and
Select *
FROM A, B, C
WHERE A.x = B.y
Generally, you can use a comma instead of the keywords CROSS JOIN. The
comma syntax is equivalent to cross join syntax, except in the case of
generated join conditions in table expressions using commas.
274
Chapter 8. Joins: Retrieving Data from Several Tables
275
Inner and outer joins
The keywords INNER, LEFT OUTER, RIGHT OUTER, and FULL
OUTER may be used to modify key joins, natural joins, and joins with an
ON phrase. The default is INNER. The keyword OUTER is optional. These
modifiers do not apply to cross joins.
Inner joins
By default, joins are inner joins. This means that rows are included in the
result set only if they satisfy the join condition.
Example For example, each row of the result set of the following query contains the
information from one customer row and one sales_order row, satisfying the
key join condition. If a particular customer has placed no orders, the
condition is not satisfied and the result set does not contain the row
corresponding to that customer.
SELECT fname, lname, order_date
FROM customer KEY INNER JOIN sales_order
ORDER BY order_date
Outer joins
A left or right outer join of two tables preserves all the rows in one table,
and supplies nulls for the other table when it does not meet the join
condition. A left outer join preserves every row in the left-hand table, and a
right outer join preserves every row in the right-hand table. In a full outer
join, all rows from both tables are preserved.
The table expressions on either side of a left or right outer join are referred
276
Chapter 8. Joins: Retrieving Data from Several Tables
♦ Include one row for every customer who has not placed any sales orders.
This ensures that every row in the customer table is included. For all of
these rows, the columns from sales_order are filled with nulls. These
rows are added because the keyword OUTER is used, and would not have
appeared in an inner join. Neither the ON condition nor the WHERE
clause is used for this step.
♦ Exclude every row where the customer does not live in New York, using
the WHERE clause.
277
Outer joins and join conditions
A common mistake with outer joins is the placement of the join condition.
In most cases, if you place restrictions on the null-supplying table in a
WHERE clause, the join is equivalent to an inner join.
The reason for this is that most search conditions cannot evaluate to TRUE
when any of their inputs are NULL. The WHERE clause restriction on the
null-supplying table compares values to null, resulting in the elimination of
the row from the result set. The rows in the preserved table are not preserved
and so the join is an inner join.
The exception to this is comparisons that can evaluate to true when any of
their inputs are NULL. These include IS NULL, IS UNKNOWN, IS FALSE,
IS NOT TRUE, and expressions involving ISNULL or COALESCE.
Example For example, the following statement computes a left outer join.
SELECT *
FROM customer KEY LEFT OUTER JOIN sales_order
ON sales_order.order_date < ’2000-01-03’
278
Chapter 8. Joins: Retrieving Data from Several Tables
Next, you may want to convert the right outer join to a left outer join so that
both joins are the same type. To do this, simply reverse the position of the
tables in the right outer join, resulting in:
SELECT *
FROM C LEFT OUTER JOIN (A LEFT OUTER JOIN B)
A is the preserved table and B is the null-supplying table for the nested outer
join. C is the preserved table for the first outer join.
You can interpret this join as follows:
♦ Join A to B, preserving all rows in A.
♦ Next, join C to the results of the join of A and B, preserving all rows in C.
The join does not have an ON phrase, and so is by default a key join. The
way Adaptive Server Anywhere generates join conditions for this type of
join is explained in “Key joins of table expressions that do not contain
commas ” on page 298.
In addition, the join condition for an outer join must only include tables that
have previously been referenced in the FROM clause. This restriction is
according to the ANSI/ISO standard, and is enforced to avoid ambiguity. For
example, the following two statements are syntactically incorrect, because C
is referenced in the join condition before the table itself is referenced.
SELECT *
FROM (A LEFT OUTER JOIN B ON B.x = C.x) JOIN C
and
SELECT *
FROM A LEFT OUTER JOIN B ON A.x = C.x, C
279
Outer joins of views and derived tables
Outer joins can also be specified for views and derived tables.
The statement
SELECT *
FROM V LEFT OUTER JOIN A ON (V.x = A.x)
Next, use this view to add a list of the departments where the women work
and the regions where they have sold. The view V is preserved and
sales_order is null-supplying.
SELECT DISTINCT V.emp_id, region, V.dept_name
FROM V LEFT OUTER JOIN sales_order
ON V.emp_id = sales_order.sales_rep
280
Chapter 8. Joins: Retrieving Data from Several Tables
Warning: When you are creating outer joins, do not mix *= syntax with
ON phrase syntax. This also applies to views that are referenced in the
query.
In the Transact-SQL dialect, you create outer joins by supplying a
comma-separated list of tables in the FROM clause, and using the special
operators *= or =* in the WHERE clause. In Adaptive Server Enterprise
prior to version 12, the join condition must appear in the WHERE clause
(ON was not supported).
Example For example, the following left outer join lists all customers and finds their
order dates (if any):
SELECT fname, lname, order_date
FROM customer, sales_order
WHERE customer.id *= sales_order.cust_id
ORDER BY order_date
281
♦ A null-supplying table cannot participate in both a Transact-SQL outer
join and a regular join or two outer joins. For example, the following
WHERE clause is not allowed, because table S violates this limitation.
WHERE R.x *= S.x
AND S.y = T.y
When you cannot rewrite your query to avoid using a table in both an
outer join and a regular join clause, you must divide your statement into
two separate queries, or use only ANSI/ISO SQL syntax.
♦ You cannot use a subquery that contains a join condition involving the
null-supplying table of an outer join. For example, the following
WHERE clause is not allowed:
WHERE R.x *= S.y
AND EXISTS ( SELECT *
FROM T
WHERE T.x = S.x )
If you define a view with an outer join, and then query the view with a
qualification on a column from the null-supplying table of the outer join, the
results may not be what you expect. The query returns all rows from the
null-supplying table. Rows that do not meet the qualification show a NULL
value in the appropriate columns of those rows.
The following rules determine what types of updates you can make to
columns through views that contain outer joins:
♦ INSERT and DELETE statements are not allowed on outer join views.
♦ UPDATE statements are allowed on outer join views. If the view is
defined WITH CHECK option, the update fails if any of the affected
columns appears in the WHERE clause in an expression that includes
columns from more than one table.
NULL values in tables or views being joined never match each other in a
Transact-SQL outer join. The result of comparing a NULL value with any
other NULL value is FALSE.
282
Chapter 8. Joins: Retrieving Data from Several Tables
Specialized joins
This section describes how to create some specialized joins such as
self-joins, star joins, and joins using derived tables.
Self-joins
In a self-join, a table is joined to itself by referring to the same table using a
different correlation name.
Example 1 The following self-join produces a list of pairs of employees. Each
employee name appears in combination with every employee name.
SELECT a.emp_fname, a.emp_lname,
b.emp_fname, b.emp_lname
FROM employee AS a CROSS JOIN employee AS b
283
the following two rows.
This statement produces the result shown partially below. The employee
names appear in the two left-hand columns, and the names of their managers
are on the right.
284
Chapter 8. Joins: Retrieving Data from Several Tables
ANSI/ISO SQL standard. The ability to use duplicate names does not add
any additional functionality, but it makes it much easier to formulate certain
queries.
The duplicate names must be in different joins for the syntax to make sense.
When a table name or view name is used twice in the same join, the second
instance is ignored. For example, FROM A,A and FROM A CROSS JOIN A
are both interpreted as FROM A.
The following example, in which A, B and C are tables, is valid in Adaptive
Server Anywhere. In this example, the same instance of table A is joined
both to B and C. Note that a comma is required to separate the joins in a star
join. The use of a comma in star joins is specific to the syntax of star joins.
SELECT *
FROM A LEFT OUTER JOIN B ON A.x = B.x,
A LEFT OUTER JOIN C ON A.y = C.y
With complex joins, it can help to draw a diagram. The previous example
can be described by the following diagram, which illustrates that tables B, C
and D are joined via table A.
285
C
B A D
Note You can use duplicate table names only if the EXTENDED_JOIN_-
SYNTAX option is ON (the default).
For more information, see the “EXTENDED_JOIN_SYNTAX option
[database]” [ASA Database Administration Guide, page 610].
Example 1 Create a list of the names of the customers who have placed orders with
Rollin Overbey. Notice that one of the tables in the FROM clause, employee,
does not contribute any columns to the results. Nor do any of the columns
that are joined—such as customer.id or employee.id—appear in the results.
Nonetheless, this join is possible only using the employee table in the
FROM clause.
SELECT customer.fname, customer.lname,
sales_order.order_date
FROM sales_order KEY JOIN customer,
sales_order KEY JOIN employee
WHERE employee.emp_fname = ’Rollin’
AND employee.emp_lname = ’Overbey’
ORDER BY sales_order.order_date
286
Chapter 8. Joins: Retrieving Data from Several Tables
Example 2 This example answers the question: How much of each product has each
customer ordered, and who is the manager of the salesperson who took the
order?
To answer the question, start by listing the information you need to retrieve.
In this case, it is product, quantity, customer name, and manager name.
Next, list the tables that hold this information. They are product,
sales_order_items, customer, and employee. When you look at the structure
of the sample database (see “Sample database schema” on page 264), you
will notice that these tables are all related through the sales_order table. You
can create a star join on the sales_order table to retrieve the information
from the other tables.
In addition, you need to create a self-join in order to get the name of the
manager, because the employee table contains ID numbers for managers and
the names of all employees, but not a column listing only manager names.
For more information, see “Self-joins” on page 283.
The following statement creates a star join around the sales_order table. The
joins are all outer joins so that the result set will include all customers. Some
customers have not placed orders, so the other values for these customers are
NULL. The columns in the result set are customer, product, quantity
ordered, and the name of the manager of the salesperson.
SELECT customer.fname, product.name,
SUM(sales_order_items.quantity), m.emp_fname
FROM sales_order
KEY RIGHT OUTER JOIN customer,
sales_order
KEY LEFT OUTER JOIN sales_order_items
KEY LEFT OUTER JOIN product,
sales_order
KEY LEFT OUTER JOIN employee AS e
LEFT OUTER JOIN employee AS m
ON (e.manager_id = m.emp_id)
WHERE customer.state = ’CA’
GROUP BY customer.fname, product.name, m.emp_fname
ORDER BY SUM(sales_order_items.quantity) DESC, customer.fname
287
fname name SUM(sales_order_items.quantity) emp_fname
product
sales_order_
items
sales_order customer
employee
AS e
employee
AS m
288
Chapter 8. Joins: Retrieving Data from Several Tables
The result is a table of the names of those customers who have placed more
than three orders, including the number of orders each has placed.
☞ For an explanation of key joins of derived tables, see “Key joins of
views and derived tables” on page 303.
☞ For an explanation of natural joins of derived tables, see “Natural joins
of views and derived tables” on page 292.
☞ For an explanation of outer joins of derived tables, see “Outer joins of
views and derived tables” on page 280.
289
Natural joins
When you specify a natural join, Adaptive Server Anywhere generates a join
condition based on columns with the same name. For this to work in a
natural join of base tables, there must be at least one pair of columns with
the same name, with one column from each table. If there is no common
column name, an error is issued.
If table A and table B have one column name in common, and that column is
called x, then
SELECT *
FROM A NATURAL JOIN B
If table A and table B have two column names in common, and they are
called a and b, then A NATURAL JOIN B is equivalent to the following:
A JOIN B
ON A.a = B.a
AND A.b = B.b
Example For example, you can join the employee and department tables using a
natural join because they have a column name in common, the dept_id
column.
SELECT emp_fname, emp_lname, dept_name
FROM employee NATURAL JOIN department
ORDER BY dept_name, emp_lname, emp_fname
290
Chapter 8. Joins: Retrieving Data from Several Tables
The next query is equivalent. In it, the natural join condition that was
generated in the previous query is specified in the ON phrase.
SELECT emp_fname, emp_lname, dept_name
FROM employee JOIN department
ON employee.manager_id = department.dept_head_id
AND employee.dept_id = department.dept_id
there are two table expressions. The column names in the table expression
A JOIN B are compared to the column names in the table expression
C JOIN D, and a join condition is generated for each unambiguous pair of
matching column names. An unambiguous pair of matching columns means
that the column name occurs in both table expressions, but does not occur
twice in the same table expression.
If there is a pair of ambiguous column names, an error is issued. However, a
column name may occur twice in the same table expression, as long as it
291
doesn’t also match the name of a column in the other table expression.
Natural joins of lists When a list of table expressions is on at least one side of a natural join, a
separate join condition is generated for each table expression in the list.
Consider the following tables:
♦ table A consists of columns called a, b and c
♦ table B consists of columns called a and d
♦ table C consists of columns called d and c
In this case, the join (A,B) NATURAL JOIN C causes Adaptive Server
Anywhere to generate two join conditions:
ON A.c = C.c
AND B.d = C.d
This is equivalent to
SELECT *
FROM (employee KEY JOIN sales_order)
JOIN (sales_order_items KEY JOIN product)
ON sales_order.id = sales_order_items.id
the columns in View1 are compared to the columns in View2. If, for
example, a column called emp_id is found to occur in both views, and there
are no other columns that have identical names, then the generated join
condition is (View1.emp_id = View2.emp_id).
292
Chapter 8. Joins: Retrieving Data from Several Tables
Example The following example illustrates that a view used in a natural join can
include expressions, and not just columns, and they are treated the same way
in the natural join. First, create the view V with a column called x, as
follows:
CREATE VIEW V(x) AS
SELECT R.y + 1
FROM R
Next, create a natural join of the view to a derived table. The derived table
has a correlation name T with a column called x.
SELECT *
FROM V NATURAL JOIN (SELECT P.y FROM P) as T(x)
293
Key joins
When you specify a key join, Adaptive Server Anywhere generates a join
condition based on the foreign key relationships in the database. To use a
key join, there must be a foreign key relationship between the tables, or an
error is issued.
The key join is a Sybase extension to the ANSI/ISO SQL standard. It does
not provide any greater functionality, but it makes it easier to formulate
certain queries.
When key join is the Key join is the default in Adaptive Server Anywhere when all of the
default following apply:
♦ the keyword JOIN is used.
♦ the keywords CROSS, NATURAL or KEY are not specified.
♦ there is no ON phrase.
Example For example, the following query is a simple key join that joins the tables
product and sales_order_items based on the foreign key relationship in the
database.
SELECT *
FROM product KEY JOIN sales_order_items
The next query is equivalent. It leaves out the word KEY, but by default a
JOIN without an ON phrase is a KEY JOIN.
SELECT *
FROM product JOIN sales_order_items
The next query is also equivalent, because the join condition specified in the
ON phrase happens to be the same as the join condition that Adaptive Server
Anywhere generates for these tables based on their foreign key relationship
in the sample database.
SELECT *
FROM product JOIN sales_order_items
ON sales_order_items.prod_id = product.id
294
Chapter 8. Joins: Retrieving Data from Several Tables
If the join condition generated by the key join of A and B is A.w = B.z,
then this query is equivalent to
SELECT *
FROM A JOIN B
ON A.x = B.y AND A.w = B.z
If you don’t know the role name of a foreign key, you can find it in Sybase
Central by expanding the database container in the left pane. Select the table
in left pane, and then click the Foreign Keys tab in the right pane. A list of
foreign keys for that table appears in the right pane.
☞ See “Sample database schema” on page 264 for a diagram that includes
the role names of all foreign keys in the sample database.
Generating join Adaptive Server Anywhere looks for a foreign key that has the same role
conditions name as the correlation name of the primary key table:
♦ If there is exactly one foreign key with the same name as a table in the
join, Adaptive Server Anywhere uses it to generate the join condition.
295
♦ If there is more than one foreign key with the same name as a table, the
join is ambiguous and an error is issued.
♦ If there is no foreign key with the same name as the table, Adaptive
Server Anywhere looks for any foreign key relationship, even if the
names don’t match. If there is more than one foreign key relationship, the
join is ambiguous and an error is issued.
Example 1 In the sample database, two foreign key relationships are defined between
the tables employee and department: the foreign key ky_dept_id in the
employee table references the department table; and the foreign key
ky_dept_head in the department table references the employee table.
employee
emp_id <pk> integer
manager_id integer
emp_fname char(20)
emp_lname char(20)
dept_id <fk> integer
street char(40) dept_id = dept_id
city char(20) department
(ky_dept_id)
state char(4)
zip_code char(9) dept_id <pk> integer
phone char(10) dept_name char(40)
emp_id = dept_head_id dept_head_id <fk> integer
status char(1) (ky_dept_head)
ss_number char(11)
salary numeric(20,3)
start_date date
termination_date date
birth_date date
bene_health_ins char(1)
bene_life_ins char(1)
bene_day_care char(1)
sex char(1)
The following query is ambiguous because there are two foreign key
relationships and neither has the same role name as the primary key table
name. Therefore, attempting this query results in the syntax error
SQLE_AMBIGUOUS_JOIN (-147).
SELECT employee.emp_lname, department.dept_name
FROM employee KEY JOIN department
Example 2 This query modifies the query in Example 1 by specifying the correlation
name ky_dept_id for the department table. Now, the foreign key ky_dept_id
has the same name as the table it references, and so it is used to define the
join condition. The result includes all the employee last names and the
departments where they work.
SELECT employee.emp_lname, ky_dept_id.dept_name
FROM employee KEY JOIN department AS ky_dept_id
296
Chapter 8. Joins: Retrieving Data from Several Tables
Example 3 If the intent was to list all the employees that are the head of a department,
then the foreign key ky_dept_head should be used and Example 1 should be
rewritten as follows. This query imposes the use of the foreign key
ky_dept_head by specifying the correlation name ky_dept_head for the
primary key table employee.
SELECT ky_dept_head.emp_lname, department.dept_name
FROM employee AS ky_dept_head KEY JOIN department
The following query is equivalent. The join condition that was generated
above is specified in the ON phrase in this query:
SELECT employee.emp_lname, department.dept_name
FROM employee JOIN department
ON department.dept_head_id = employee.emp_id
Example 4 A correlation name is not needed if the foreign key role name is identical to
the primary key table name. For example, we can define the foreign key
department for the employee table:
ALTER TABLE employee ADD FOREIGN KEY department (dept_id)
REFERENCES department (dept_id)
Now, this foreign key relationship is the default join condition when a KEY
JOIN is specified between the two tables. If the foreign key department is
defined, then the following query is equivalent to Example 3.
SELECT employee.emp_lname, department.dept_name
FROM employee KEY JOIN department
Note If you try this example in Interactive SQL, you should reverse the change to
the sample database with the following statement:
ALTER TABLE employee DROP FOREIGN KEY department
297
The table-pairs are A-C, A-D, B-C and B-D. Adaptive Server Anywhere
considers the relationship within each pair and then creates a generated join
condition for the table expression as a whole. How Adaptive Server
Anywhere does this depends on whether the table expressions use commas
or not. Therefore, the generated join conditions in the following two
examples are different. A JOIN B is a table expression that does not contain
commas, and (A,B) is a table expression list.
SELECT *
FROM (A JOIN B) KEY JOIN C
The two types of join behavior are explained in the following sections:
♦ “Key joins of table expressions that do not contain commas ” on page 298
♦ “Key joins of table expression lists” on page 299
When both of the two table expressions being joined do not contain commas,
Adaptive Server Anywhere examines the foreign key relationships in the
pairs of tables in the statement, and generates a single join condition.
For example, the following join has two table-pairs, A-C and B-C.
(A NATURAL JOIN B) KEY JOIN C
298
Chapter 8. Joins: Retrieving Data from Several Tables
To generate a join condition for the key join of two table expression lists,
Adaptive Server Anywhere examines the pairs of tables in the statement, and
generates a join condition for each pair. The final join condition is the
conjunction of the join conditions for each pair. There must be a foreign key
relationship between each pair.
The following example joins two table-pairs, A-C and B-C.
SELECT *
FROM (A,B) KEY JOIN C
299
(A,B) by generating a join condition for each of the two pairs A-C and B-C.
It does so according to the rules for key joins when there are multiple foreign
key relationships:
♦ For each pair, Adaptive Server Anywhere looks for a foreign key that has
the same role name as the correlation name of the primary key table. If
there is exactly one foreign key meeting this criterion, it uses it. If there is
more than one, the join is considered to be ambiguous and an error is
issued.
♦ For each pair, if there is no foreign key with the same name as the
correlation name of the table, Adaptive Server Anywhere looks for any
foreign key relationship between the tables. If there is one, it uses it. If
there is more than one, the join is considered to be ambiguous and an
error is issued.
♦ For each pair, if there is no foreign key relationship, an error is issued.
♦ If Adaptive Server Anywhere is able to determine exactly one join
condition for each pair, it combines the join conditions using AND.
☞ For more information, see “Key joins when there are multiple foreign
key relationships” on page 295.
Example The following query returns the names of all salespeople who have sold at
least one order to a specific region.
SELECT DISTINCT employee.emp_lname,
ky_dept_id.dept_name, sales_order.region
FROM (sales_order, department AS ky_dept_id)
KEY JOIN employee
300
Chapter 8. Joins: Retrieving Data from Several Tables
For the pair department AS ky_dept_id and employee, there is one foreign
key that has the same role name as the primary key table. It is ky_dept_id,
and it matches the correlation name given to the department table in the
query. There are no other foreign keys with the same name as the correlation
name of the primary key table, so ky_dept_id is used to form the join
condition for the table-pair. The join condition that is generated is
(employee.dept_id = ky_dept_id.dept_id). Note that there is
another foreign key relating the two tables, but as it has a different name
from either of the tables, it is not a factor.
The final join condition adds together the join condition generated for each
table-pair. Therefore, the following query is equivalent:
SELECT DISTINCT employee.emp_lname, department.dept_name, sales_
order.region
FROM ( sales_order, department )
JOIN employee
ON employee.emp_id = sales_order.sales_rep
AND employee.dept_id = department.dept_id
Key joins of lists and table expressions that do not contain commas
When table expression lists are joined via key join with table expressions
that do not contain commas, Adaptive Server Anywhere generates a join
condition for each table in the table expression list.
For example, the following statement is the key join of a table expression list
with a table expression that does not contain commas. This example
generates a join condition for table A with table expression
C NATURAL JOIN D, and for table B with table expression
C NATURAL JOIN D.
SELECT *
FROM (A,B) KEY JOIN (C NATURAL JOIN D)
301
as the correlation name of a table, Adaptive Server Anywhere looks for
any foreign key relationship between the tables. If there is exactly one
relationship, it uses it. If there is more than one, the join is ambiguous
and an error is issued.
♦ For each set of pairs, if there is no foreign key relationship, an error is
issued.
♦ If Adaptive Server Anywhere is able to determine exactly one join
condition for each set of pairs, it combines the join conditions with the
keyword AND.
Example 1 Consider the following join of five tables:
((A,B) JOIN (C NATURAL JOIN D) ON A.x = D.y) KEY JOIN E
302
Chapter 8. Joins: Retrieving Data from Several Tables
The definition of View1 can be any of the following and result in the same
join condition to B. (The result set will differ, but the join conditions will be
identical.)
SELECT *
FROM C CROSS JOIN D
or
SELECT *
FROM C,D
303
or
SELECT *
FROM C JOIN D ON (C.x = D.y)
In each case, to generate a join condition for the key join of View1 and B,
Adaptive Server Anywhere considers the table-pairs C-B and D-B, and
generates a single join condition. It generates the join condition based on the
rules for multiple foreign key relationships described in “Key joins of table
expressions” on page 297, except that it looks for a foreign key with the
same name as the correlation name of the view (rather than a table
referenced in the view).
Using any of the view definitions above, you can interpret the processing of
View1 KEY JOIN B as follows:
Adaptive Server Anywhere generates a single join condition by considering
the table-pairs C-B and D-B. It generates the join condition according to the
rules for determining key joins when there are multiple foreign key
relationships:
♦ First, it looks at both C-B and D-B for a single foreign key that has the
same role name as the correlation name of the view. If there is exactly
one foreign key meeting this criterion, it uses it. If there is more than one
foreign key with the same role name as the correlation name of the view,
the join is considered to be ambiguous and an error is issued.
♦ If there is no foreign key with the same name as the correlation name of
the view, Adaptive Server Anywhere looks for any foreign key
relationship between the tables. If there is one, it uses it. If there is more
than one, the join is considered to be ambiguous and an error is issued.
♦ If there is no foreign key relationship, an error is issued.
Assume this generated join condition is B.y = D.z. We can now expand
the original join.
SELECT *
FROM View1 KEY JOIN B
is equivalent to
SELECT *
FROM View1 JOIN B ON B.y = View1.z
☞ For more information, see “Key joins when there are multiple foreign
key relationships” on page 295.
Example 2 The following view contains all the employee information about the
manager of each department.
304
Chapter 8. Joins: Retrieving Data from Several Tables
CREATE VIEW V AS
SELECT department.dept_name, employee.*
FROM employee JOIN department
ON employee.emp_id = department.dept_head_id
This is equivalent to
SELECT *
FROM V JOIN (sales_order, department ky_dept_id)
ON (V.emp_id = sales_order.sales_rep
AND V.dept_id = ky_dept_id.dept_id)
305
Rule 2: key join of table This rule applies to A KEY JOIN B, where A and B are table expressions
expressions that do not that do not contain commas.
contain commas
1. For each pair of tables; one from expression A and one from expression
B, list all foreign keys, and mark all preferred foreign keys between the
tables. The rule for determining a preferred foreign key is given in Rule
1, above.
2. If there is more than one preferred key, then the join is ambiguous. The
syntax error SQLE_AMBIGUOUS_JOIN (-147) is issued.
3. If there is a single preferred key, then this foreign key is chosen to define
the generated join condition for this KEY JOIN expression.
Rule 3: key join of table This rule applies to (A1, A2, ...) KEY JOIN ( B1, B2, ...)
expression lists where A1, B1, and so on are table expressions that do not contain commas.
1. For each pair of table expressions Ai and Bj, find a unique generated join
condition for the table expression (Ai KEY JOIN Bj) by applying Rule
1 or 2. If any KEY JOIN for a pair of table expressions is ambiguous by
Rule 1 or 2, a syntax error is generated.
2. The generated join condition for this KEY JOIN expression is the
conjunction of the join conditions found in step 1.
Rule 4: key join of lists This rule applies to (A1, A2, ...) KEY JOIN ( B1, B2, ...)
and table expressions where A1, B1, and so on are table expressions that may contain commas.
that do not contain
commas 1. For each pair of table expressions Ai and Bj, find a unique generated join
condition for the table expression (Ai KEY JOIN Bj) by applying Rule
1, 2, or 3. If any KEY JOIN for a pair of table expressions is ambiguous
by Rule 1, 2, or 3, then a syntax error is generated.
2. The generated join condition for this KEY JOIN expression is the
conjunction of the join conditions found in step 1.
306
CHAPTER 9
About this chapter The WITH prefix to the SELECT statements affords you the opportunity to
define common table expressions. These can be used like temporary views
within your query. This chapter describes how to use them.
Contents Topic: page
307
About common table expressions
Common table expressions are temporary views that are known only within
the scope of a single SELECT statement. They permit you to write queries
more easily, and to write queries that could not otherwise be expressed.
Common table expressions are useful or may be necessary if a query
involves multiple aggregate functions or defines a view within a stored
procedure that references program variables. Common table expressions
also provide a convenient means to temporarily store sets of values.
Recursive common table expressions permit you to query tables that
represent hierarchical information, such as reporting relationships within a
company. They can also be used to solve parts explosion problems and least
distance problems.
☞ For information about recursive queries, see “Recursive common table
expressions” on page 316.
For example, consider the problem of determining which department has the
most number of employees. The employee table in the sample database lists
all the employees in a fictional company and specifies in which department
each works. The following query lists the department ID codes and the total
number of employees in each department.
SELECT dept_id, count(*) AS n
FROM employee
GROUP BY dept_id
This query can be used to extract the department with the most employees as
follows:
SELECT dept_id, n
FROM ( SELECT dept_id, count(*) AS n
FROM employee GROUP BY dept_id ) AS a
WHERE a.n =
( SELECT max(n)
FROM ( SELECT dept_id, count(*) AS n
FROM employee GROUP BY dept_id ) AS b )
While this statement provides the correct result, it has some disadvantages.
The first disadvantage is that the repeated subquery makes this statement
clumsy. The second is that this statement provides no clear link between the
subqueries.
One way around these problems is to create a view, then use it to re-express
the query. This approach avoids the problems mentioned above.
308
Chapter 9. Common Table Expressions
309
WITH CountEmployees(dept_id, n) AS
( SELECT dept_id, count(*) AS n
FROM employee GROUP BY dept_id )
SELECT a.dept_id, a.n, b.dept_id, b.n
FROM CountEmployees AS a JOIN CountEmployees AS b
ON a.n = b.n AND a.dept_id < b.dept_id
Multiple table A single WITH clause may define more than one common table expression.
expressions These definitions must be separated by commas. The following example
lists the department that has the smallest payroll and the department that has
the largest number of employees.
WITH
CountEmployees(dept_id, n) AS
( SELECT dept_id, count(*) AS n
FROM employee GROUP BY dept_id ),
DeptPayroll( dept_id, amt ) AS
( SELECT dept_id, sum(salary) AS amt
FROM employee GROUP BY dept_id )
SELECT count.dept_id, count.n, pay.amt
FROM CountEmployees AS count JOIN DeptPayroll AS pay
ON count.dept_id = pay.dept_id
WHERE count.n = ( SELECT max(n) FROM CountEmployees )
OR pay.amt = ( SELECT min(amt) FROM DeptPayroll )
310
Chapter 9. Common Table Expressions
311
Typical applications of common table expressions
In general, common table expressions are useful whenever a table expression
must appear multiple times within a single query. The following typical
situations are suited to common table expressions.
♦ Queries that involve multiple aggregate functions.
♦ Views within a procedure that must contain a reference to a program
variable.
♦ Queries that use temporary views to store a set of values.
This list is not exhaustive. You may encounter many other situations in
which common table expressions are useful.
Multiple aggregate Common table expressions are useful whenever multiple levels of
functions aggregation must appear within a single query. This is the case in the
example used in the previous section. The task was to retrieve the
department ID of the department that has the most employees. To do so, the
count aggregate function is used to calculate the number of employees in
each department and the max function is used to select the largest
department.
A similar situation arises when writing a query to determine which
department has the largest payroll. The sum aggregate function is used to
calculate each department’s payroll and the max function to determine
which is largest. The presence of both functions in the query is a clue that a
common table expression may be helpful.
WITH DeptPayroll( dept_id, amt ) AS
( SELECT dept_id, sum(salary) AS amt
FROM employee GROUP BY dept_id )
SELECT dept_id, amt
FROM DeptPayroll
WHERE amt = ( SELECT max(amt)
FROM DeptPayroll )
Views that reference Sometimes, it can be convenient to create a view that contains a reference to
program variables a program variable. For example, you may define a variable within a
procedure that identifies a particular customer. You want to query the
customer’s purchase history, and as you will be accessing similar
information multiple times or perhaps using multiple aggregate functions,
you want to create a view that contains information about that specific
customer.
You cannot create a view that references a program variable because there is
no way to limit the scope of a view to that of your procedure. Once created,
a view can be used by in other contexts. You can, however, use a common
312
Chapter 9. Common Table Expressions
The above query is the basis of the common table expression that appears in
the following procedure. The ID number of the sales representative and the
year in question are incoming parameters. As this procedure demonstrates,
the procedure parameters and any declared local variables can be referenced
within the WITH clause.
313
CREATE PROCEDURE sales_rep_total (
IN rep INTEGER,
IN yyyy INTEGER )
BEGIN
DECLARE start_date DATE;
DECLARE end_date DATE;
SET start_date = YMD( yyyy, 1, 1 );
SET end_date = YMD( yyyy, 12, 31 );
WITH total_sales_by_rep ( sales_rep_name,
sales_rep_id,
month,
order_year,
total_sales ) AS
( SELECT emp_fname || ’ ’ || emp_lname AS sales_rep_name,
sales_rep AS sales_rep_id, month( order_date),
year(order_date),
sum( p.unit_price * i.quantity ) AS total_sales
FROM employee LEFT OUTER JOIN sales_order o
INNER JOIN sales_order_items i
INNER JOIN product p
WHERE start_date <= order_date AND
order_date <= end_date AND
sales_rep = rep
GROUP BY year(order_date), month(order_date),
emp_fname, emp_lname, sales_rep )
SELECT sales_rep_name,
monthname( YMD(yyyy, month, 1) ) AS month_name,
order_year,
total_sales
FROM total_sales_by_rep
WHERE total_sales =
( SELECT max( total_sales) FROM total_sales_by_rep )
ORDER BY order_year ASC, month ASC;
END;
Views that store values Sometimes, it can be useful to store a particular set of values within a
SELECT statement or within a procedure. For example, suppose a company
prefers to analyze the results of its sales staff by thirds of a year, instead of
by quarter. Since there is no built-in date part for thirds, as there is for
quarters, it is necessary to store the dates within the procedure.
314
Chapter 9. Common Table Expressions
This method should be used with care, as the values may need periodic
maintenance. For example, the above statement must be modified if it is to
be used for any other year.
You can also apply this technique within procedures. The following example
declares a procedure that takes the year in question as an argument.
CREATE PROCEDURE sales_by_third ( IN y INTEGER )
BEGIN
WITH thirds (q_name, q_start, q_end) AS
( SELECT ’T1’, YMD( y, 01, 01), YMD( y, 04, 30) UNION
SELECT ’T2’, YMD( y, 05, 01), YMD( y, 08, 31) UNION
SELECT ’T3’, YMD( y, 09, 01), YMD( y, 12, 31) )
SELECT q_name,
sales_rep,
count(*) AS num_orders,
sum(p.unit_price * i.quantity) AS total_sales
FROM thirds JOIN sales_order AS o
ON q_start <= order_date AND order_date <= q_end
KEY JOIN sales_order_items AS i
KEY JOIN product AS p
GROUP BY q_name, sales_rep
ORDER BY q_name, sales_rep;
END;
CALL sales_by_third (2000);
315
Recursive common table expressions
Common table expressions may be recursive. Common table expressions are
recursive when the RECURSIVE keyword appears immediately after WITH.
A single WITH clause may contain multiple recursive expressions, and may
contain both recursive and non-recursive common table expressions.
Recursive common table expressions provide a convenient way to write
queries that return relationships to an arbitrary depth. For example, given a
table that represents the reporting relationships within a company, you can
readily write a query that returns all the employees that report to one
particular person.
Depending on how you write the query, you can either limit the number of
levels of recursion or you can provide no limit. Limiting the number of
levels permits you to return only the top levels of management, for example,
but may exclude some employees if the chains of command are longer than
you anticipated. Providing no restriction on the number of levels ensures no
employees will be excluded, but can introduce infinite recursion should the
graph contain any cycles; for example, if an employee directly or indirectly
reports to himself. This situation could arise within a company’s
management hierarchy if, for example, an employee within the company
also sits on the board of directors.
Recursion provides a much easier means of traversing tables that represent
tree or tree-like data structures. The only way to traverse such a structure in
a single statement without using recursive expressions is to join the table to
itself once for each possible level. For example, if a reporting hierarchy
contains at most seven levels, you must join the employee table to itself
seven times. If the company reorganizes and a new management level is
introduced, you must rewrite the query.
Recursive common table expressions contain an initial subquery, or seed,
and a recursive subquery that during each iteration appends additional rows
to the result set. The two parts can be connected only with the operator
UNION ALL. The initial subquery is an ordinary non-recursive query and is
processed first. The recursive portion contains a reference to the rows added
during the previous iteration. Recursion stops automatically whenever an
iteration generates no new rows. There is no way to reference rows selected
prior to the previous iteration.
The select list of the recursive subquery must match that of the initial
subquery in number and data type. If automatic translation of data types
cannot be performed, explicitly cast the results of one subquery so that they
match those in the other subquery.
316
Chapter 9. Common Table Expressions
The condition within the recursive query that restricts the management level
to less than 20 is an important precaution. It prevents infinite recursion in the
event that the table data contains a cycle.
The The option MAX_RECURSIVE_ITERATIONS is designed to catch
MAX_RECURSIVE_- runaway recursive queries. The default value of this option is 100. Recursive
ITERATIONS queries that exceed this number of levels of recursion terminate, but cause an
option error.
Although this option may seem to diminish the importance of a stop
condition, this is not usually the case. The number of rows selected during
each iteration may grow exponentially, seriously impacting database
performance before the maximum is reached. Stop conditions within
recursive queries provide a means of setting appropriate limits in each
situation.
317
recursive common table expressions cannot be mutually recursive.
However, non-recursive common table expressions can contain
references to recursive ones, and recursive common table expressions can
contain references to non-recursive ones.
♦ The only set operator permitted between the initial subquery and the
recursive subquery is UNION ALL. No other set operators are permitted.
♦ Within the definition of a recursive subquery, a self-reference to the
recursive table expression can appear only within the FROM clause of the
recursive subquery.
♦ When a self-reference appears within the FROM clause of the recursive
subquery, the reference to the recursive table cannot appear on the
null-supplying side of an outer join.
318
Chapter 9. Common Table Expressions
bookcase
1 2 4
3
4
back side shelf foot
8 4
1
1
1
1
screw
backboard
plank
The information in the table below represents the edges of the bookshelf
graph. The first column names a component, the second column names one
of the subcomponents of that component, and the third column specifies how
many of the subcomponents are required.
bookcase back 1
bookcase side 2
bookcase shelf 3
bookcase foot 4
bookcase screw 4
back backboard 1
319
component subcomponent quantity
back screw 8
side plank 1
shelf plank 1
shelf screw 4
The following statements create the bookcase table and insert the data shown
in the above table.
CREATE TABLE bookcase (
component VARCHAR(9),
subcomponent VARCHAR(9),
quantity integer,
PRIMARY KEY (component, subcomponent)
);
INSERT INTO bookcase
SELECT ’bookcase’, ’back’, 1 UNION
SELECT ’bookcase’, ’side’, 2 UNION
SELECT ’bookcase’, ’shelf’, 3 UNION
SELECT ’bookcase’, ’foot’, 4 UNION
SELECT ’bookcase’, ’screw’, 4 UNION
SELECT ’back’, ’backboard’, 1 UNION
SELECT ’back’, ’screw’, 8 UNION
SELECT ’side’, ’plank’, 1 UNION
SELECT ’shelf’, ’plank’, 1 UNION
SELECT ’shelf’, ’screw’, 4;
After you have created the bookcase table, you can recreate the table of its
parts, shown above, with the following query..
SELECT * FROM bookcase
ORDER BY component, subcomponent;
With this table constructed, you can generate a list of the primitive parts and
the quantity of each required to construct the bookcase.
WITH RECURSIVE parts (component, subcomponent, quantity) AS
( SELECT component, subcomponent, quantity
FROM bookcase WHERE component = ’bookcase’
UNION ALL
SELECT b.component, b.subcomponent, p.quantity * b.quantity
FROM parts p JOIN bookcase b ON p.subcomponent = b.component )
SELECT subcomponent, sum(quantity) AS quantity
FROM parts
WHERE subcomponent NOT IN ( SELECT component FROM bookcase )
GROUP BY subcomponent
ORDER BY subcomponent;
320
Chapter 9. Common Table Expressions
subcomponent quantity
backboard 1
foot 4
plank 5
screw 24
Alternatively, you can rewrite this query to perform an additional level of
recursion, thus avoiding the need for the subquery in the main SELECT
statement:
WITH RECURSIVE parts (component, subcomponent, quantity) AS
( SELECT component, subcomponent, quantity
FROM bookcase WHERE component = ’bookcase’
UNION ALL
SELECT p.subcomponent, b.subcomponent,
IF b.quantity IS NULL
THEN p.quantity
ELSE p.quantity * b.quantity
ENDIF
FROM parts p LEFT OUTER JOIN bookcase b
WHERE p.subcomponent = b.component AND
p.subcomponent IS NOT NULL
)
SELECT component, sum(quantity) FROM parts
WHERE subcomponent IS NULL
GROUP BY component
ORDER BY component;
The results of this query are identical to those of the previous query.
321
Data type declarations in recursive common table
expressions
The data types of the columns in the temporary view are defined by those of
the initial subquery. The data types of the columns from the recursive
subquery must match. The database server automatically attempts to convert
the values returned by the recursive subquery to match those of the initial
query. If this is not possible, or if information may be lost in the conversion,
an error is generated.
In general, explicit casts are often required when the initial subquery returns
a literal value or NULL. Explicit casts may also be required when the initial
subquery selects values from different columns than the recursive subquery.
Casts may be required if the columns of the initial subquery do not have the
same domains as those of the recursive subquery. Casts must always be
applied to NULL values in the initial subquery.
For example, the bookshelf parts explosion sample works correctly because
the initial subquery returns rows from the bookcase table, and thus inherits
the data types of the selected columns.
☞ For more information, see “Parts explosion problems” on page 319.
If this query is rewritten as follows, explicit casts are required.
WITH RECURSIVE parts (component, subcomponent, quantity) AS
( SELECT NULL, ’bookcase’, 1 -- ERROR! Wrong domains!
UNION ALL
SELECT b.component, b.subcomponent,
p.quantity * b.quantity
FROM parts p JOIN bookcase b
ON p.subcomponent = b.component )
SELECT * FROM parts
ORDER BY component, subcomponent
322
Chapter 9. Common Table Expressions
323
Least distance problem
You can use recursive common table expressions to find desirable paths on a
directed graph. Each row in a database table represents a directed edge.
Each row specifies an origin, a destination, and a cost of traveling from the
origin to the destination. Depending on the problem, the cost may represent
distance, travel time, or some other measure. Recursion permits you to
explore possible routes through this graph. From the set of possible routes,
you can then select the ones that interest you.
For example, consider the problem of finding a desirable way to drive
between the cities of Kitchener and Pembroke. There are quite a few
possible routes, each of which takes you through a different set of
intermediate cities. The goal is to find the shortest routes, and to compare
them to reasonable alternatives.
North Bay
0 22
0
13
15
0
0
Barrie
23
Ottawa
5
15
90
230
Kitchener 105 Toronto 190 Belleville
First, define a table to represent the edges of this graph and insert one row
for each edge. Since all the edges of this graph happen to be bi-directional,
the edges that represent the reverse directions must be inserted also. This is
done by selecting the initial set of rows, but interchanging the origin and
destination. For example, one row must represent the trip from Kitchener to
Toronto, and another row the trip from Toronto back to Kitchener.
CREATE TABLE travel (
origin AS VARCHAR(10),
destination AS VARCHAR(10),
distance AS INT,
PRIMARY KEY (origin, destination)
);
324
Chapter 9. Common Table Expressions
The next task is to write the recursive common table expression. Since the
trip will start in Kitchener, the initial subquery begins by selecting all the
possible paths out of Kitchener, along with the distance of each.
The recursive subquery extends the paths. For each path, it adds segments
that continue along from the destinations of the previous segments, adding
the length of the new segments so as to maintain a running total cost of each
route. For efficiency, routes are terminated if they meet either of the
following conditions:
♦ The path returns to the starting location.
♦ The path returns to the previous location.
♦ The path reaches the desired destination.
In the current example, no path should return to Kitchener and all paths
should be terminated if they reach Pembroke.
It is particularly important to guarantee that recursive queries will terminate
properly when using them to explore cyclic graphs. In this case, the above
conditions are insufficient, as a route may include an arbitrarily large
number of trips back and forth between two intermediate cities. The
recursive query below guarantees termination by limiting the maximum
number of segments in any given route to seven.
Since the point of the example query is to select a practical route, the main
query selects only those routes that are less than 50 percent longer than the
shortest route.
325
WITH RECURSIVE
trip (route, destination, previous, distance, segments) AS
( SELECT CAST(origin || ’, ’ || destination AS VARCHAR(256)),
destination, origin, distance, 1
FROM travel
WHERE origin = ’Kitchener’
UNION ALL
SELECT route || ’, ’ || v.destination,
v.destination, -- current endpoint
v.origin, -- previous endpoint
t.distance + v.distance, -- total distance
segments + 1 -- total number of segments
FROM trip t JOIN travel v ON t.destination = v.origin
WHERE v.destination <> ’Kitchener’ -- Don’t return to start
AND v.destination <> t.previous -- Prevent backtracking
AND v.origin <> ’Pembroke’ -- Stop at the end
AND segments -- TERMINATE RECURSION!
< ( SELECT count(*)/2 FROM travel ) )
SELECT route, distance, segments FROM trip
WHERE destination = ’Pembroke’ AND
distance < 1.5 * ( SELECT min(distance)
FROM trip
WHERE destination = ’Pembroke’ )
ORDER BY distance, segments, route;
When run with against the above data set, this statement yields the following
results.
326
Chapter 9. Common Table Expressions
327
CREATE PROCEDURE best_routes (
IN initial VARCHAR(10),
IN final VARCHAR(10)
)
BEGIN
WITH RECURSIVE
trip (route, destination, previous, distance, segments) AS
( SELECT CAST(origin || ’, ’ || destination AS VARCHAR(256)),
destination, origin, distance, 1
FROM travel
WHERE origin = initial
UNION ALL
SELECT route || ’, ’ || v.destination,
v.destination, -- current endpoint
v.origin, -- previous endpoint
t.distance + v.distance, -- total distance
segments + 1 -- total number of segments
FROM trip t JOIN travel v ON t.destination = v.origin
WHERE v.destination <> initial -- Don’t return to start
AND v.destination <> t.previous -- Prevent backtracking
AND v.origin <> final -- Stop at the end
AND segments -- TERMINATE RECURSION!
< ( SELECT count(*)/2 FROM travel ) )
SELECT route, distance, segments FROM trip
WHERE destination = final AND
distance < 1.4 * ( SELECT min(distance)
FROM trip
WHERE destination = final )
ORDER BY distance, segments, route;
END;
CALL best_routes ( ’Pembroke’, ’Kitchener’ );
328
CHAPTER 10
Using OLAP
About this chapter This chapter describes online analytical processing (OLAP) functionality.
OLAP is a method of data analysis for information stored in relational
databases. Using OLAP you can acquire result sets with subtotalled rows
and organize data into multidimensional cubes. It lets you use filters so that
you can drill down into the data and is designed to return result sets quickly.
Contents Topic: page
329
Understanding OLAP
Online analytical processing (OLAP) lets you analyze the data stored in
relational databases. The result sets you obtain using OLAP can have
subtotalled rows and can be organized into multidimensional cubes.
OLAP gives you the ability to do time series analyses, cost allocations, goal
seeking, ad hoc multidimensional structural changes, non-procedural
modeling, exception alerting, and data mining. Adaptive Server Anywhere
provides a multidimensional conceptual view of the data, including full
support for hierarchies and multiple hierarchies.
OLAP functionality lets you complete the following tasks:
♦ calculations applied across dimensions
♦ calculations applied through hierarchies
♦ trend analysis over sequences of time
330
Chapter 10. Using OLAP
Understanding subtotals
Subtotal rows can help you analyze data, especially if there are large
amounts of data, different dimensions to the data, data contained in different
tables, or even different databases altogether. For example, a sales manager
might find reports on sales figures broken down by sales representative,
region, and quarter to be useful in understanding patterns in sales. Subtotals
for the data give the sales manager a picture of overall sales from different
perspectives. Analyzing this data is easier when summary information is
provided based on the criteria that the sales manager wants to compare.
With OLAP, the procedure for analyzing and computing row and column
subtotals is invisible to the end-user. The following diagram shows
conceptually how Adaptive Server Anywhere creates subtotals:
1 2 3
Variables arranged
Subtotals attached to
Query calculated by ORDER BY
result set
clause
Note
This sequence is seamless to the end-user.
331
In the example above the group-by-list includes two variables (Year and
Quarter).
GROUP BY ROLLUP (Year, Quarter)
Note
There are the same number of prefixes as there are items in the group-
by-list.
In this query, the prefix containing the Year column leads to a summary
row for Year=2000 and a summary row for Year=2001. There is a single
summary row for the prefix that has no columns, which is a subtotal over
all rows in the intermediate result set.
The value of each column in a subtotal row is as follows:
• Column included in the prefix The value of the column. For
example, in the preceding query, the value of the Year column for the
subtotal over rows with Year=2000 is 2000.
• Column excluded from the prefix NULL. For example, the Quarter
column has a value of NULL for the subtotal rows generated by the
prefix consisting of the Year column.
332
Chapter 10. Using OLAP
333
NULL values and subtotal rows
When rows in the input to a GROUP BY operation contain NULL, there is
the possibility of confusion between subtotal rows added by the ROLLUP,
CUBE, or GROUPING SETS operations and rows that contain NULL
values that are part of the original input data.
The GROUPING function distinguishes subtotal rows from others by taking
a column in the group by list as its argument, and returning 1 if the column
is NULL because the row is a subtotal row, and 0 otherwise.
Example The following example includes GROUPING columns in the result set.
Rows are highlighted that contain NULL as a result of the input data, not
because they are subtotal rows. The GROUPING columns are highlighted.
The query is an outer join between the employee table and the sales_order
table. The query selects female employees who live in either Texas, New
York, or California. NULL appears in the columns corresponding to those
female employees who are not sales representatives (and therefore have no
sales).
SELECT employee.emp_id AS Employee,
year(order_date) AS Year,
COUNT(*) AS Orders,
GROUPING ( Employee ) AS GE,
GROUPING ( Year ) AS GY
FROM employee LEFT OUTER JOIN sales_order
ON employee.emp_id = sales_order.sales_rep
WHERE employee.sex IN (’F’)
AND employee.state IN (’TX’, ’CA’, ’NY’)
GROUP BY ROLLUP (Year, Employee)
ORDER BY Year, Employee
The table that follows represents the result set from the query.
Employee Year Orders GY GE
(NULL) (NULL) 5 1 0
(NULL) (NULL) 169 1 1
These employees are 102 (NULL) 1 0 0
not sales representatives 390 (NULL) 1 0 0
and therefore do not 1062 (NULL) 1 0 0
have any sales to record 1090 (NULL) 1 0 0
1507 (NULL) 1 0 0
(NULL) 2000 98 1 0
667 2000 34 0 0
949 2000 31 0 0
1142 2000 33 0 0
(NULL) 2001 66 1 0
667 2001 20 0 0
949 2001 22 0 0
1142 2001 24 0 0
334
Chapter 10. Using OLAP
Using ROLLUP
The ROLLUP operation provides subtotals of aggregate rows. It adds
subtotal rows into the result sets of queries with GROUP BY clauses.
ROLLUP generates a result set showing aggregates for a hierarchy of values
in the selected columns. Use the ROLLUP operation when you want a result
set showing totals and subtotals.
Example 1 In the example that follows, the query returns data that summarizes the
number of sales orders by year and quarter.
SELECT year (order_date) AS Year, quarter (order_date) AS
Quarter, COUNT (*) AS Orders
FROM sales_order
GROUP BY ROLLUP (Year, Quarter)
ORDER BY Year, Quarter
The table that follows represents the result set from the query. The subtotal
rows are highlighted in the result set. Each subtotal row has a NULL in the
column or columns over which the subtotal is computed.
335
The row marked [1] represents the total number of orders across both years
(2000, 2001) and all quarters. This row has NULL in both the Year and
Quarter columns and is the row where all columns were excluded from the
prefix.
Note
Every ROLLUP operation returns a result set with one row where NULL
appears in each column except for the aggregate column. This row
represents the summary of each column to the aggregate function. For
example, if SUM were the aggregate function in question, this row would
represent the grand total of all values.
The rows highlighted row [2] represent the total number of orders in the
years 2000 and 2001, respectively. Both rows have NULL in the Quarter
column because the values in that column are rolled up to give a subtotal for
Year.
Note
The number of rows like this in your result set depends on the number of
variables that appear in your ROLLUP query.
The remaining rows marked [3] provide summary information by giving the
total number of orders for each quarter in both years.
Example 2 Here is another example of the ROLLUP operation that returns a slightly
more complicated result set than the first example. The result set
summarizes the number of sales orders by year, quarter, and region. In this
example, only the first and second quarters and two selected regions (Canada
and the Eastern region) are examined.
SELECT year (order_date) AS Year, quarter (order_date) AS
Quarter, region, COUNT (*) AS Orders
FROM sales_order
WHERE region IN (’Canada’, ’Eastern’) AND quarter IN (’1’, ’2’)
GROUP BY ROLLUP (Year, Quarter, Region)
ORDER BY Year, Quarter, Region
The table that follows represents the result set from the query. Each subtotal
row has a NULL in the column or columns over which the subtotal is
computed.
336
Chapter 10. Using OLAP
1 (NULL)
(NULL) (NULL) 183
2000 (NULL) (NULL) 68
2000 1 (NULL) 36
2000 1 Canada 3
2000 1 Eastern 33
2 2000 2 (NULL) 32
2000 2 Canada 3
2000 2 Eastern 29
2001 (NULL) (NULL) 115
2001 1 (NULL) 57
2001 1 Canada 11
2001 1 Eastern 46
2001 2 (NULL) 58
2001 2 Canada 4
2001 2 Eastern 54
The first row [1] is an aggregate over all rows, and has NULL in the Year,
Quarter, and Region columns. The value in the Orders column of this row
represents the total number of orders in Canada and the Eastern region in
Quarters 1 and 2 in the years 2000 and 2001.
The rows marked [2] represent the total number of sales orders in each year
(2000) and (2001) in Quarters 1 and 2 in Canada and the Eastern region. The
values of these rows [2] are equal to the grand total represented in row [1].
The rows marked [3] provide data about the total number of orders for the
given year and quarter by region.
337
Year Quarter Region Orders
The rows marked [4] provide data about the total number of orders for each
year, each quarter, and each region in the result set.
338
Chapter 10. Using OLAP
339
Using CUBE
The CUBE operation is an OLAP feature. OLAP is a way of organizing data
to fit the way you analyze and manage it so that it takes less time and effort
to generate the result sets that you need. You precalculate the summary
values for the reports, which speeds up report calculation. CUBES are useful
when you want to work with a large amount of data in your reports.
Like ROLLUP, the CUBE operator provides subtotals of aggregate values in
the result set. When the CUBE operation is performed on variables, the
result set includes many subtotal rows based on combinations of the values
of the variables. Unlike ROLLUP, CUBE returns all possible combinations
of the variables that you specify in the query.
The CUBE operator returns a result set with added information of
dimensions to the data. You can further analyze columns of your data, which
are referred to as dimensions. CUBE provides a cross tabulation report of all
possible combinations of the dimensions and generates a result set that
shows aggregates for all combinations of values in selected columns.
Note
CUBE is particularly useful when your dimensions are not a part of the
same hierarchy.
For more information about rows returned by CUBE queries, see “About
OLAP operations” on page 331. For more information about using
GROUP BY clauses in queries, see “GROUP BY clause” [ASA SQL
Reference, page 492].
Example In the example that follows, the query returns a result set that summarizes
the total number of orders and then calculates subtotals for the number of
orders by year and quarter.
340
Chapter 10. Using OLAP
Note
As the number of variables that you want to compare increases, so too does
the complexity of the cube that the query returns.
SELECT year (order_date) AS Year, quarter (order_date) AS
Quarter, COUNT (*) AS Orders
FROM sales_order
GROUP BY CUBE (Year, Quarter)
ORDER BY Year, Quarter
The table that follows represents the result set from the query. The subtotal
rows are highlighted in the result set. Each subtotal row has a NULL in the
column or columns over which the subtotal is computed.
The first highlighted row [1] represents the total number of orders across
both years and all quarters. The value in the Orders column is the sum of the
values in each of the rows marked [3]. It is also the sum of the four values in
the rows marked [2].
Note
All CUBE operations return result sets with at least one row where NULL
appears in each column except for the aggregate column. This row
represents the summary of each column to the aggregate function.
The next set of highlighted rows [2] represents the total number of orders by
quarter across both years. The following two rows marked by [3] represent
the total number of orders across all quarters first in the year 2000 and then
in the year 2001.
Understanding CUBE The CUBE operation is equivalent to a GROUPING SETS query that
341
includes all possible combinations of variables.
For information about the GROUPING SETS syntax used in the preceding
SQL query, see “Using GROUPING SETS” on page 343.
342
Chapter 10. Using OLAP
Example The following example shows how you can tailor the results that are returned
from a query using GROUPING SETS. The query specifies that the result
set returns subtotal rows for each year, but not for the individual quarters.
SELECT year (order_date) AS Year, quarter (order_date) AS
Quarter, COUNT (*) AS Orders
FROM sales_order
GROUP BY GROUPING SETS ((Year, Quarter), (Year))
ORDER BY Year, Quarter
The table that follows represents the result set from the query. The subtotal
rows are highlighted.
343
Year Quarter Orders
344
Chapter 10. Using OLAP
345
Working with OLAP functions
OLAP functions provide the capability to perform analytic tasks on data,
such as computing moving averages, ranks, and cumulative values. You can
include an OLAP function in a select-list or on the ORDER BY clause of a
SELECT statement. An OLAP function cannot be used as an argument of an
aggregate function. Therefore, you cannot have functions like
SUM( RANK() ).
When using an OLAP function, a window is specified that defines the rows
over which the function is applied, and in what order. The set of rows is
defined, relative to the current row, as either a range or a number of rows
preceding and following. For example, you can calculate an average over the
previous three month period.
The following OLAP functions are available:
♦ Rank functions Let you compile a list of values from your data set in
ranked order. For example, listing the sales representatives in decreasing
order of total sales for each quarter.
♦ Reporting functions Let you compare a non-aggregate value to an
aggregate value. For example, listing all quarters in which expenses are
less than the average.
♦ Window functions Let you analyze your data by defining a moving
window for it. For example, calculating a moving average, such as the
NASDAQ figure, over a 30-day period.
Rank functions
Rank functions let you compute a rank value for each row in a result set
based on an ordering specified in the query. For example, a sales manager
might need to identify the top or bottom sales people in the company, the
highest- or lowest-performing sales region, or the best- or worst-selling
products. Rank functions provide this information.
The following statistical functions are classified as rank functions:
♦ CUME_DIST function Computes the position of one row returned from
a query with respect to the other rows returned by the query, as defined by
the ORDER BY clause. For information, see “CUME_DIST function
[Aggregate]” [ASA SQL Reference, page 120].
♦ RANK function Computes the rank of each row returned from a query
with respect to the other rows returned by the query, as defined by the
ORDER BY clause. For information, see “RANK function [Aggregate]”
[ASA SQL Reference, page 193].
346
Chapter 10. Using OLAP
The table that follows represents the result set from the query.
Shishov 72995.00 F 1
Wang 68400.00 M 2
Cobb 62000.00 M 3
Morris 61300.00 M 4
Diaz 54900.00 M 5
Driscoll 48023.00 M 6
Hildebrand 45829.00 F 7
Goggin 37900.00 M 8
Rebeiro 34576.00 M 9
Bigelow 31200.00 F 10
Lynch 24903.00 M 11
Example 2 You can partition your data to provide different results. Using the query
from example #1, you can change the data by partitioning it by gender. The
following example ranks employees in descending order by salary and
partitions by gender.
347
SELECT emp_lname, salary, sex,
RANK () OVER (PARTITION BY sex
ORDER BY salary DESC) "Rank"
FROM employee
WHERE state IN (’UT’)
The table that follows represents the result set from the query.
Example 3 The following example finds a list of female employees in Utah and Arizona
and ranks them in descending order according to salary. The
PERCENT_RANK function is used to provide a cumulative total in
descending order.
SELECT emp_lname, salary,
PERCENT_RANK () OVER (ORDER BY salary DESC) "Rank"
FROM employee
WHERE state IN (’UT’,’AZ’) AND sex IN (’F’)
The table that follows represents the result set from the query.
Shishov 72995.00 0
Jordan 51432.00 0.25
Hildebrand 45829.00 0.5
Bigelow 31200.00 0.75
Bertrand 29800.00 1
Example 4 You can use rank functions to find the top or bottom percentiles in the data
set. In the following example, the query returns male employees whose
salary is in the top five percent of the data set.
348
Chapter 10. Using OLAP
SELECT *
FROM ( SELECT emp_lname, salary,
PERCENT_RANK () OVER (ORDER BY salary DESC) "Rank"
FROM employee
WHERE sex IN (’M’) )
AS DT ( last_name, salary, percent )
WHERE percent < 0.05
The table that follows represents the result set from the query.
Scott 96300.00 0
Sheffield 87900.00 0.025
Lull 87900.00 0.025
Reporting functions
Reporting functions let you compare non-aggregate values to aggregate
values. For example, a salesperson might need to compile a list of all
customers who ordered more than the average number of a product in a
specified year, or a manager might want to compare an employee’s salary
against the average salary of the department.
You can use any aggregate function in conjunction with reporting functions.
For a list of the available aggregate functions, see “Aggregate functions”
[ASA SQL Reference, page 86].
Note
You can use the same statistical functions, with the exception of LIST, with
window functions.
Example 1 The following query returns a result set that shows a list of the products that
sold higher than the average number of sales. The result set is partitioned by
year.
SELECT *
FROM (SELECT year(order_date) AS Year, prod_id,
SUM( quantity ) AS Q,
AVG (SUM(quantity))
OVER (PARTITION BY Year) AS Average
FROM sales_order JOIN sales_order_items
GROUP BY year(order_date), prod_id
ORDER BY Year)
AS derived_table
WHERE Q > Average
349
The table that follows represents the result set from the query.
For the year 2000, the average number of orders was 1787. Four products
(700, 601, 600, and 400) sold higher than that amount. In 2001, the average
number of orders was 1048 and three products exceeded that amount.
Example 2 The following query returns a result set that shows the employees whose
salary is one standard deviation greater than the average salary of their
department. Standard deviation is a measure of how much the data varies
from the mean.
SELECT *
FROM (SELECT
emp_lname AS Employee,
dept_id AS Dept,
CAST( salary as DECIMAL( 10, 2) ) AS Salary,
CAST( AVG( salary )
OVER (PARTITION BY dept_id) AS DECIMAL (10,2) ) AS
Average,
CAST (STDDEV_POP (salary)
OVER (PARTITION BY dept_id) AS DECIMAL (10, 2 ) ) AS
Std_Dev
FROM employee
GROUP BY Dept, Employee, Salary)
AS derived_table
WHERE Salary > Average + Std_Dev
ORDER BY Dept, Salary, Employee
The table that follows represents the result set from the query. Every
department has at least one employee whose salary significantly deviates
from the mean.
350
Chapter 10. Using OLAP
Window functions
Window functions let you analyze your data by computing aggregate values
over windows surrounding each row. The result set returns a summary value
representing a set of rows. You can use window functions to calculate a
moving average of the sales figures for a company over a specified time
period.
You can use any aggregate function (except LIST) in conjunction with
window functions.
For a list of the available aggregate functions, see “Aggregate functions”
[ASA SQL Reference, page 86].
Note
You can use the same statistical functions with reporting functions.
The illustration that follows shows a partition and the window view that it
contains. The window is defined by a certain number of rows and descends
through the partition.
351
Start of partition
Start of window
End of window
End of partition
Example The following example shows a window function. The query returns a result
set that partitions the data by department and then provides a cumulative
summary of employees’ salaries starting with the employee who has been at
the company the longest. The result set includes only those employees who
reside in California, Utah, New York, or Arizona. The column Sum Salary
provides the cumulative total of employees’ salaries.
SELECT dept_id, emp_lname, start_date, salary,
SUM(salary) OVER (PARTITION BY dept_id
ORDER BY start_date
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS "Sum_
Salary"
FROM employee
WHERE state IN (’CA’, ’UT’, ’NY’, ’AZ’) AND dept_id IN (’100’,
’200’)
ORDER BY dept_id, start_date;
The table that follows represents the result set from the query. The result set
is partitioned by department.
352
Chapter 10. Using OLAP
California, Utah, New York, and Arizona is $434,091.69 and the cumulative
total for employees in department 200 is $250,200.00.
353
CHAPTER 11
Using Subqueries
About this chapter When you create a query, you use WHERE and HAVING clauses to restrict
the rows that the query returns.
Sometimes, the rows you select depend on information stored in more than
one table. A subquery in the WHERE or HAVING clause allows you to
select rows from one table according to specifications obtained from another
table. Additional ways to do this can be found in “Joins: Retrieving Data
from Several Tables” on page 263.
Before your start This subquery chapter assumes some knowledge of queries and the syntax of
the select statement. Information about queries appears in “Queries:
Selecting Data from a Table” on page 213.
Contents Topic: page
355
Introduction to subqueries
A relational database stores information about different types of objects in
different tables. For example, you should store information particular to
products in one table, and information that pertains to sales orders in another.
The product table contains the information about the various products. The
sales order items table contains information about customers’ orders.
In general, only the simplest questions can be answered using only one table.
For example, if the company reorders products when there are fewer than 50
of them in stock, then it is possible to answer the question “Which products
are nearly out of stock?” with this query:
SELECT id, name, description, quantity
FROM product
WHERE quantity < 50
However, if “nearly out of stock” depends on how many items of each type
the typical customer orders, the number “50” will have to be replaced by a
value obtained from the sales_order_items table.
Structure of the subquery A subquery is structured like a regular query, and appears in the main
query’s SELECT, FROM, WHERE, or HAVING clause. Continuing with
the previous example, you can use a subquery to select the average number
of items that a customer orders, and then use that figure in the main query to
find products that are nearly out of stock. The following query finds the
names and descriptions of subquery the products which number less than
twice the average number of items of each type that a customer orders.
SELECT name, description
FROM product WHERE quantity < 2 * (
SELECT avg(quantity)
FROM sales_order_items
)
In the WHERE clause, subqueries help select the rows from the tables listed
in the FROM clause that appear in the query results. In the HAVING clause,
they help select the row groups, as specified by the main query’s
GROUP BY clause, that appear in the query results.
356
Chapter 11. Using Subqueries
This is a two-step query: first, find the average number of items requested
per order; and then find which products in stock number less than double
that quantity.
The query in two steps The quantity column of the sales_order_items table stores the number of
items requested per item type, customer, and order. The subquery is
SELECT avg(quantity)
FROM sales_order_items
357
Subqueries in the HAVING clause
Although you usually use subqueries as search conditions in the WHERE
clause, sometimes you can also use them in the HAVING clause of a query.
When a subquery appears in the HAVING clause, like any expression in the
HAVING clause, it is used as part of the row group selection.
Here is a request that lends itself naturally to a query with a subquery in the
HAVING clause: “Which products’ average in-stock quantity is more than
double the average number of each item ordered per customer?”
Example SELECT name, avg(quantity)
FROM product
GROUP BY name
HAVING avg(quantity) > 2* (
SELECT avg(quantity)
FROM sales_order_items
)
name avg(product.quantity)
Shorts 80
The query executes as follows:
♦ The subquery calculates the average quantity of items in the
sales_order_items table.
♦ The main query then goes through the product table, calculating the
average quantity product, grouping by product name.
♦ The HAVING clause then checks if each average quantity is more than
double the quantity found by the subquery. If so, the main query returns
that row group; otherwise, it doesn’t.
♦ The SELECT clause produces one summary row for each group,
displaying the name of each product and its in-stock average quantity.
You can also use outer references in a HAVING clause, as shown in the
following example, a slight variation on the one above.
Example ”Find the product ID numbers and line ID numbers of those products whose
average ordered quantities is more than half the in-stock quantities of those
products.”
358
Chapter 11. Using Subqueries
prod_id line_id
401 2
401 1
401 4
501 3
... ...
In this example, the subquery must produce the in-stock quantity of the
product corresponding to the row group being tested by the HAVING clause.
The subquery selects records for that particular product, using the outer
reference sales_order_items.prod_id.
A subquery with a This query uses the comparison “>”, suggesting that the subquery must
comparison returns a return exactly one value. In this case, it does. Since the id field of the
single value product table is a primary key, there is only one record in the product table
corresponding to any particular product id.
Subquery tests
The chapter “Queries: Selecting Data from a Table” on page 213 describes
simple search conditions you can use in the HAVING clause. Since a
subquery is just an expression that appears in the WHERE or HAVING
clauses, the search conditions on subqueries may look familiar.
They include:
♦ Subquery comparison test Compares the value of an expression to a
single value produced by the subquery for each record in the table(s) in
the main query.
♦ Quantified comparison test Compares the value of an expression to
each of the set of values produced by a subquery.
♦ Subquery set membership test Checks if the value of an expression
matches one of the set of values produced by a subquery.
♦ Existence test Checks if the subquery produces any rows.
359
Subquery comparison test
The subquery comparison test (=, <>, <. <=, >, >=) is a modified version
of the simple comparison test. The only difference between the two is that in
the former, the expression following the operator is a subquery. This test is
used to compare a value from a row in the main query to a single value
produced by the subquery.
Example This query contains an example of a subquery comparison test:
SELECT name, description, quantity
FROM product
WHERE quantity < 2 * (
SELECT avg(quantity)
FROM sales_order_items)
Then the main query compares the quantity of each in-stock item to that
value.
A subquery in a A subquery in a comparison test must return exactly one value. Consider this
comparison test returns query, whose subquery extracts two columns from the sales_order_items
one value table:
SELECT name, description, quantity
FROM product
WHERE quantity < 2 * (
SELECT avg(quantity), max (quantity)
FROM sales_order_items)
It returns the error Subquery allowed only one select list item.
360
Chapter 11. Using Subqueries
id cust_id
2006 105
2007 106
2008 107
2009 108
... ...
In executing this query, the main query tests the order dates for each order
against the shipping dates of every product of the order #2005. If an order
date is greater than the shipping date for one shipment of order #2005, then
that id and customer id from the sales_order table are part of the result set.
The ANY test is thus analogous to the OR operator: the above query can be
read, “Was this sales order placed after the first product of the order #2005
was shipped, or after the second product of order #2005 was shipped, or. . . ”
Understanding the ANY The ANY operator can be a bit confusing. It is tempting to read the query as
operator “Return those orders placed after any products of order #2005 were
shipped.” But this means the query will return the order IDs and customer
361
IDs for the orders placed after all products of order #2005 were
shipped—which is not what the query does.
Instead, try reading the query like this: “Return the order and customer IDs
for those orders placed after at least one product of order #2005 was
shipped.” Using the keyword SOME may provide a more intuitive way to
phrase the query. The following query is equivalent to the previous query.
SELECT id, cust_id
FROM sales_order
WHERE order_date > SOME (
SELECT ship_date
FROM sales_order_items
WHERE id=2005)
362
Chapter 11. Using Subqueries
id cust_id
2002 102
2003 103
2004 104
2005 101
... ...
In executing this query, the main query tests the order dates for each order
against the shipping dates of every product of order #2001. If an order date
is greater than the shipping date for every shipment of order #2001, then the
id and customer id from the sales_order table are part of the result set. The
ALL test is thus analogous to the AND operator: the above query can be
read, “Was this sales order placed before the first product of order #2001 was
shipped, and before the second product of order #2001 was shipped, and. . . ”
Notes about the ALL There are three additional important characteristics of the ALL test:
operator
♦ Empty subquery result set If the subquery produces an empty result
set, the ALL test returns TRUE. This makes sense, since if there are no
results, then it is true that the comparison test holds for every value in the
result set.
♦ NULL values in subquery result set If the comparison test is false for
any values in the result set, the ALL search returns FALSE. It returns
TRUE if all values are true. Otherwise, it returns UNKNOWN—for
example, this can occur if there is a NULL value in the subquery result
set but the search condition is TRUE for all non-NULL values.
♦ Negating the ALL test The following expressions are not equivalent.
NOT a = ALL (subquery)
a <> ALL (subquery)
363
Testing set membership with IN conditions
You can use the subquery set membership test to compare a value from the
main query to more than one value in the subquery.
The subquery set membership test compares a single data value for each row
in the main query to the single column of data values produced by the
subquery. If the data value from the main query matches one of the data
values in the column, the subquery returns TRUE.
Example Select the names of the employees who head the Shipping or Finance
departments:
SELECT emp_fname, emp_lname
FROM employee
WHERE emp_id IN (
SELECT dept_head_id
FROM department
WHERE (dept_name=’Finance’ or dept_name = ’Shipping’))
emp_fname emp_lname
Jose Martinez
extracts from the department table the id numbers that correspond to the
heads of the Shipping and Finance departments. The main query then
returns the names of the employees whose id numbers match one of the two
found by the subquery.
Set membership test is The subquery set membership test is equivalent to the =ANY test. The
equivalent to =ANY test following query is equivalent to the query from the above example.
SELECT emp_fname, emp_lname
FROM employee
WHERE emp_id =ANY (
SELECT dept_head_id
FROM department
WHERE (dept_name=’Finance’ or dept_name = ’Shipping’))
Negation of the set You can also use the subquery set membership test to extract those rows
membership test whose column values are not equal to any of those produced by a subquery.
To negate a set membership test, insert the word NOT in front of the
364
Chapter 11. Using Subqueries
keyword IN.
Example The subquery in this query returns the first and last names of the employees
that are not heads of the Finance or Shipping departments.
SELECT emp_fname, emp_lname
FROM employee
WHERE emp_id NOT IN (
SELECT dept_head_id
FROM department
WHERE (dept_name=’Finance’ OR dept_name = ’Shipping’))
365
Existence test
Subqueries used in the subquery comparison test and set membership test
both return data values from the subquery table. Sometimes, however, you
may be more concerned with whether the subquery returns any results,
rather than which results. The existence test (EXISTS) checks whether a
subquery produces any rows of query results. If the subquery produces one
or more rows of results, the EXISTS test returns TRUE. Otherwise, it returns
FALSE.
Example Here is an example of a request expressed using a subquery: “Which
customers placed orders after July 13, 2001?”
SELECT fname, lname
FROM customer
WHERE EXISTS (
SELECT *
FROM sales_order
WHERE (order_date > ’2001-07-13’) AND
(customer.id = sales_order.cust_id))
fname lname
Almen de Joie
Grover Pendelton
Bubba Murphy
Explanation of the Here, for each row in the customer table, the subquery checks if that
existence test customer ID corresponds to one that has placed an order after July 13, 2001.
If it does, the query extracts the first and last names of that customer from
the main table.
The EXISTS test does not use the results of the subquery; it just checks if
the subquery produces any rows. So the existence test applied to the
following two subqueries return the same results. These are subqueries and
cannot be processed on their own, because they refer to the customer table
which is part of the main query, but not part of the subquery.
☞ For more information, see “Correlated subqueries” on page 373.
366
Chapter 11. Using Subqueries
SELECT *
FROM sales_order
WHERE (order_date > ’2001-07-13’) AND (customer.id = sales_
order.cust_id)
SELECT ship_date
FROM sales_order
WHERE (order_date > ’2001-07-13’) AND (customer.id = sales_
order.cust_id)
It does not matter which columns from the sales_order table appear in the
SELECT statement, though by convention, the “SELECT *” notation is
used.
Negating the existence You can reverse the logic of the EXISTS test using the NOT EXISTS form.
test In this case, the test returns TRUE if the subquery produces no rows, and
FALSE otherwise.
Correlated subqueries You may have noticed that the subquery contains a reference to the id
column from the customer table. A reference to columns or expressions in
the main table(s) is called an outer reference and the subquery is said to be
correlated. Conceptually, SQL processes the above query by going through
the customer table, and performing the subquery for each customer. If the
order date in the sales_order table is after July 13, 2001, and the customer ID
in the customer and sales_order tables match, then the first and last names
from the customer table appear. Since the subquery references the main
query, the subquery in this section, unlike those from previous sections,
returns an error if you attempt to run it by itself.
367
Outer references
Within the body of a subquery, it is often necessary to refer to the value of a
column in the active row of the main query. Consider the following query:
SELECT name, description
FROM product
WHERE quantity < 2 * (
SELECT avg(quantity)
FROM sales_order_items
WHERE product.id = sales_order_items.prod_id)
This query extracts the names and descriptions of the products whose
in-stock quantities are less than double the average ordered quantity of that
product—specifically, the product being tested by the WHERE clause in the
main query. The subquery does this by scanning the sales_order_items table.
But the product.id column in the WHERE clause of the subquery refers to a
column in the table named in the FROM clause of the main query—not the
subquery. As SQL moves through each row of the product table, it uses the
id value of the current row when it evaluates the WHERE clause of the
subquery.
Description of an outer The product.id column in this subquery is an example of an outer reference.
reference A subquery that uses an outer reference is a correlated subquery. An outer
reference is a column name that does not refer to any of the columns in any
of the tables in the FROM clause of the subquery. Instead, the column name
refers to a column of a table specified in the FROM clause of the main query.
As the above example shows, the value of a column in an outer reference
comes from the row currently being tested by the main query.
368
Chapter 11. Using Subqueries
Order_date sales_rep
2001-01-05 1596
2000-01-27 667
2000-11-11 467
2001-02-04 195
... ...
The subquery yields a list of customer IDs that correspond to the two
customers whose names are listed in the WHERE clause, and the main query
finds the order dates and sales representatives corresponding to those two
people’s orders.
Replacing a subquery The same question can be answered using joins. Here is an alternative form
with a join of the query, using a two-table join:
SELECT order_date, sales_rep
FROM sales_order, customer
WHERE cust_id=customer.id AND
(lname = ’Clarke’ OR fname = ’Suresh’)
This form of the query joins the sales_order table to the customer table to
find the orders for each customer, and then returns only those records for
Suresh and Clarke.
Some joins cannot be Both of these queries find the correct order dates and sales representatives,
written as subqueries and neither is more right than the other. Many people will find the subquery
form more natural, because the request doesn’t ask for any information
about customer IDs, and because it might seem odd to join the sales_order
and customer tables together to answer the question.
369
If, however, the request changes to include some information from the
customer table, the subquery form no longer works. For example, the
request “When did Mrs. Clarke and Suresh place their orders, and by which
representatives, and what are their full names?”, it is necessary to include the
customer table in the main WHERE clause:
SELECT fname, lname, order_date, sales_rep
FROM sales_order, customer
WHERE cust_id=customer.id AND (lname = ’Clarke’ OR fname =
’Suresh’)
370
Chapter 11. Using Subqueries
Nested subqueries
As we have seen, subqueries always appear in the HAVING clause or the
WHERE clause of a query. A subquery may itself contain a WHERE clause
and/or a HAVING clause, and, consequently, a subquery may appear in
another subquery. Subqueries inside other subqueries are called nested
subqueries.
Examples List the order IDs and line IDs of those orders shipped on the same day
when any item in the fees department was ordered.
SELECT id, line_id
FROM sales_order_items
WHERE ship_date = ANY (
SELECT order_date
FROM sales_order
WHERE fin_code_id IN (
SELECT code
FROM fin_code
WHERE (description = ’Fees’)))
id line_id
2001 1
2001 2
2001 3
2002 1
... ...
Explanation of the ♦ In this example, the innermost subquery produces a column of financial
nested subqueries codes whose descriptions are “Fees”:
SELECT code
FROM fin_code
WHERE (description = ’Fees’)
♦ The next subquery finds the order dates of the items whose codes match
one of the codes selected in the innermost subquery:
SELECT order_date
FROM sales_order
WHERE fin_code_id IN (subquery)
♦ Finally, the outermost query finds the order IDs and line IDs of the orders
shipped on one of the dates found in the subquery.
SELECT id, line_id
FROM sales_order_items
WHERE ship_date = ANY (subquery)
371
Nested subqueries can also have more than three levels. Though there is no
maximum number of levels, queries with three or more levels take
considerably longer to run than do smaller queries.
372
Chapter 11. Using Subqueries
Correlated subqueries
In a simple query, the database server evaluates and processes the query’s
WHERE clause once for each row of the query. Sometimes, though, the
subquery returns only one result, making it unnecessary for the database
server to evaluate it more than once for the entire result set.
Uncorrelated subqueries Consider this query:
SELECT name, description
FROM product
WHERE quantity < 2 * (
SELECT avg(quantity)
FROM sales_order_items)
In this example, the subquery calculates exactly one value: the average
quantity from the sales_order_items table. In evaluating the query, the
database server computes this value once, and compares each value in the
quantity field of the product table to it to determine whether to select the
corresponding row.
Correlated subqueries When a subquery contains an outer reference, you cannot use this shortcut.
For instance, the subquery in the query
SELECT name, description
FROM product
WHERE quantity < 2 * (
SELECT avg(quantity)
FROM sales_order_items
WHERE product.id=sales_order_items.prod_id)
returns a value dependent upon the active row in the product table. Such a
subquery is called a correlated subquery. In these cases, the subquery might
return a different value for each row of the outer query, making it necessary
for the database server to perform more than one evaluation.
373
queries to use joins. The conversion is carried out without any user action.
This section describes which subqueries can be converted to joins so you can
understand the performance of queries in your database.
Example The question “When did Mrs. Clarke and Suresh place their orders, and by
which sales representatives?” can be written as a two-level query:
SELECT order_date, sales_rep
FROM sales_order
WHERE cust_id IN (
SELECT id
FROM customer
WHERE lname = ’Clarke’ OR fname = ’Suresh’)
An alternate, and equally correct way to write the query uses joins:
SELECT fname, lname, order_date, sales_rep
FROM sales_order, customer
WHERE cust_id=customer.id AND
(lname = ’Clarke’ OR fname = ’Suresh’)
The criteria that must be satisfied in order for a multi-level query to be able
to be rewritten with joins differ for the various types of operators. Recall that
when a subquery appears in the query’s WHERE clause, it is of the form
SELECT select-list
FROM table
WHERE
[NOT] expression comparison-operator ( subquery )
| [NOT] expression comparison-operator { ANY | SOME } ( subquery )
| [NOT] expression comparison-operator ALL ( subquery )
| [NOT] expression IN ( subquery )
| [NOT] EXISTS ( subquery )
GROUP BY group-by-expression
HAVING search-condition
Whether a subquery can be converted to a join depends on a number of
factors, such as the type of operator and the structures of the query and of
the subquery.
Comparison operators
A subquery that follows a comparison operator (=, <>, <, <=, >, >=) must
satisfy certain conditions if it is to be converted into a join. Subqueries that
follow comparison operators are valid only if they return exactly one value
for each row of the main query. In addition to this criterion, a subquery is
converted to a join only if the subquery
♦ does not contain a GROUP BY clause
374
Chapter 11. Using Subqueries
However, the request, “Find the products whose in-stock quantities are less
than double the average ordered quantity” cannot be converted to a join, as
the subquery contains the aggregate function avg:
SELECT name, description
FROM product
WHERE quantity < 2 * (
SELECT avg(quantity)
FROM sales_order_items)
375
♦ The conjunct ‘expression comparison-operator ALL (subquery)’ must not
be negated.
However, the request, “When did Ms. Clarke, Suresh, and any employee
who is also a customer, place their orders?” would be phrased as a union
query, and thus cannot be converted to a join:
SELECT order_date, sales_rep
FROM sales_order
WHERE cust_id = ANY (
SELECT id
FROM customer
WHERE lname = ’Clarke’ OR fname = ’Suresh’
UNION
SELECT emp_id
FROM employee)
Similarly, the request “Find the order IDs and customer IDs of those orders
that were not placed after all products of order #2001 were shipped,” is
naturally expressed with a subquery:
SELECT id, cust_id
FROM sales_order
WHERE NOT order_date > ALL (
SELECT ship_date
FROM sales_order_items
WHERE id=2001)
376
Chapter 11. Using Subqueries
However, the request “Find the order IDs and customer IDs of those orders
not shipped after the first shipping dates of all the products” would be
phrased as the aggregate query:
SELECT id, cust_id
FROM sales_order
WHERE NOT order_date > ALL (
SELECT first (ship_date)
FROM sales_order_items )
SELECT select-list
FROM table
WHERE expression comparison-operator ANY( subquery )
and
SELECT select-list
FROM table
WHERE NOT expression comparison-operator ANY( subquery )
are not.
Logical equivalence of This is because the first two queries are in fact equivalent, as are the last two.
ANY and ALL Recall that the any operator is analogous to the OR operator, but with a
expressions variable number of arguments; and that the ALL operator is similarly
analogous to the AND operator. Just as the expression
NOT ((X > A) AND (X > B))
the expression
NOT order_date > ALL (
SELECT first (ship_date)
FROM sales_order_items )
377
is equivalent to the expression
order_date <= ANY (
SELECT first (ship_date)
FROM sales_order_items )
= <>
< =>
> =<
=< >
=> <
<> =
378
Chapter 11. Using Subqueries
Similarly, the request “Find the names of employees who are not department
heads” is formulated as the negated subquery
SELECT emp_fname, emp_lname
FROM employee
WHERE NOT emp_id IN (
SELECT dept_head_id
FROM department
WHERE (dept_name=’Finance’ OR dept_name = ’Shipping’))
379
So the query
SELECT emp_fname, emp_lname
FROM employee
WHERE emp_id IN (
SELECT dept_head_id
FROM department
WHERE (dept_name=’Finance’ or dept_name = ’Shipping’))
Existence test
A subquery that follows the keyword EXISTS is converted to a join only if it
satisfies the following two conditions:
♦ The main query does not contain a GROUP BY clause, and is not an
aggregate query, or the subquery returns exactly one value.
♦ The conjunct ‘EXISTS (subquery)’ is not negated.
♦ The subquery is correlated; that is, it contains an outer reference.
Example Therefore, the request, “Which customers placed orders after July 13,
2001?”, which can be formulated by this query whose non-negated subquery
contains the outer reference customer.id = sales_order.cust_id, could be
converted to a join.
SELECT fname, lname
FROM customer
WHERE EXISTS (
SELECT *
FROM sales_order
WHERE (order_date > ’2001-07-13’)
AND (customer.id = sales_order.cust_id))
The EXISTS keyword essentially tells the database server to check for
empty result sets. When using inner joins, the database server automatically
displays only the rows where there is data from all of the tables in the FROM
clause. So, this query returns the same rows as does the one with the
subquery:
380
Chapter 11. Using Subqueries
381
CHAPTER 12
About this chapter This chapter describes how to modify the data in a database.
Most of the chapter is devoted to the INSERT, UPDATE, and DELETE
statements, as well as statements for bulk loading and unloading.
Contents Topic: page
383
Data modification statements
The statements you use to add, change, or delete data are called data
modification statements. The most common such statements include:
♦ Insert adds new rows to a table
384
Chapter 12. Adding, Changing, and Deleting Data
You can omit the list of column names if you provide a value for each
column in the table, in the order in which they appear when you execute a
query using SELECT *.
INSERT from SELECT You can use SELECT within an INSERT statement to pull values from one
or more tables. If the table you are inserting data into has a large number of
columns, you can also use WITH AUTO NAME to simplify the syntax.
Using WITH AUTO NAME, you only need to specify the column names in
the SELECT statement, rather than in both the INSERT and the SELECT
statements. The names in the SELECT statement must be column references
or aliased expressions.
A simplified version of the syntax for the INSERT statement using a select
statement is:
INSERT [ INTO ] table-name
WITH AUTO NAME select-statement
☞ For more information about the INSERT statement, see the “INSERT
statement” [ASA SQL Reference, page 506].
Notes ♦ Type the values in the same order as the column names in the original
CREATE TABLE statement, that is, first the ID number, then the name,
then the department head ID.
♦ Surround the values by parentheses.
♦ Enclose all character data in single quotes.
385
♦ Use a separate insert statement for each row you add.
The dept_head_id column has no default, but can allow NULL. A NULL is
assigned to that column.
The order in which you list the column names must match the order in which
you list the values. The following example produces the same results as the
previous one:
INSERT INTO department (dept_name, dept_id )
VALUES (’Western Sales’, 703)
Values for unspecified When you specify values for only some of the columns in a row, one of four
columns things can happen to the columns with no values specified:
♦ NULL entered NULL appears if the column allows NULL and no
default value exists for the column.
♦ A default value entered The default value appears if a default exists for
the column.
♦ A unique, sequential value entered A unique, sequential value
appears if the column has the AUTOINCREMENT default or the
IDENTITY property.
♦ INSERT rejected, and an error message appears An error message
appears if the column does not allow NULL and no default exists.
By default, columns allow NULL unless you explicitly state NOT NULL in
the column definition when creating tables. You can alter the default using
the ALLOW_NULLS_BY_DEFAULT option.
Restricting column data You can create constraints for a column or domain. Constraints govern the
using constraints kind of data you can or cannot add.
☞ For more information on constraints, see “Using table and column
constraints” on page 89.
386
Chapter 12. Adding, Changing, and Deleting Data
Explicitly inserting NULL You can explicitly insert NULL into a column by typing NULL. Do not
enclose this in quotes, or it will be taken as a string.
For example, the following statement explicitly inserts NULL into the
dept_head_id column:
INSERT INTO department
VALUES (703, ’Western Sales’, NULL )
Using defaults to supply You can define a column so that, even though the column receives no value,
values a default value automatically appears whenever a row is inserted. You do
this by supplying a default for the column.
☞ For more information about defaults, see “Using column defaults” on
page 83.
387
in a row just as you do with the VALUES clause. Simply specify the
columns to which you want to add data in the INSERT clause.
Inserting Data from the You can insert data into a table based on other data in the same table.
Same Table Essentially, this means copying all or part of a row.
For example, you can insert new products, based on existing products, into
the product table. The following statement adds new Extra Large Tee Shirts
(of Tank Top, V-neck, and Crew Neck varieties) into the product table. The
identification number is ten greater than the existing sized shirt:
INSERT INTO product
SELECT id+ 10, name, description,
’Extra large’, color, 50, unit_price
FROM product
WHERE name = ’Tee Shirt’
388
Chapter 12. Adding, Changing, and Deleting Data
389
Changing data using UPDATE
You can use the UPDATE statement, followed by the name of the table or
view, to change single rows, groups of rows, or all rows in a table. As in all
data modification statements, you can change the data in only one table or
view at a time.
The UPDATE statement specifies the row or rows you want changed and the
new data. The new data can be a constant or an expression that you specify
or data pulled from other tables.
If an UPDATE statement violates an integrity constraint, the update does not
take place and an error message appears. For example, if one of the values
being added is the wrong data type, or if it violates a constraint defined for
one of the columns or data types involved, the update does not take place.
UPDATE syntax A simplified version of the UPDATE syntax is:
UPDATE table-name
SET column_name = expression
WHERE search-condition
If the company Newton Ent. (in the customer table of the sample database)
is taken over by Einstein, Inc., you can update the name of the company
using a statement such as the following:
UPDATE customer
SET company_name = ’Einstein, Inc.’
WHERE company_name = ’Newton Ent.’
You can use any expression in the WHERE clause. If you are not sure how
the company name was spelled, you could try updating any company called
Newton, with a statement such as the following:
UPDATE customer
SET company_name = ’Einstein, Inc.’
WHERE company_name LIKE ’Newton%’
The search condition need not refer to the column being updated. The
company ID for Newton Entertainments is 109. As the ID value is the
primary key for the table, you could be sure of updating the correct row
using the following statement:
UPDATE customer
SET company_name = ’Einstein, Inc.’
WHERE id = 109
The SET clause The SET clause specifies the columns to be updated, and their new values.
The WHERE clause determines the row or rows to be updated. If you do not
have a WHERE clause, the specified columns of all rows are updated with
390
Chapter 12. Adding, Changing, and Deleting Data
The FROM clause You can use a FROM clause to pull data from one or more tables into the
table you are updating.
391
Changing data using INSERT
You can use the ON EXISTING clause of the INSERT statement to update
existing rows in a table (based on primary key lookup) with new values. This
clause can only be used on tables that have a primary key. Attempting to use
this clause on tables without primary keys or on proxy tables generates a
syntax error.
Specifying the ON EXISTING clause causes the server to do a primary key
lookup for each input row. If the corresponding row does not exist, it inserts
the new row. For rows already existing in the table, you can choose to:
♦ generate an error for duplicate key values. This is the default behavior if
the ON EXISTING clause is not specified.
♦ silently ignore the input row, without generating any errors.
♦ update the existing row with the values in the input row
☞ For more information, see the “INSERT statement” [ASA SQL Reference,
page 506].
392
Chapter 12. Adding, Changing, and Deleting Data
This allows you to delete rows even if they contain primary keys referenced
by a foreign key, but does not permit a COMMIT unless the corresponding
foreign key is deleted also.
The following view displays products and the value of that product that has
been sold:
CREATE VIEW ProductPopularity as
SELECT product.id,
SUM(product.unit_price * sales_order_items.quantity) as
"Value Sold"
FROM product JOIN sales_order_items
ON product.id = sales_order_items.prod_id
GROUP BY product.id
Using this view, you can delete those products which have sold less than
$20,000 from the product table.
DELETE
FROM product
FROM product NATURAL JOIN ProductPopularity
WHERE "Value Sold" < 20000
You should roll back your changes when you have completed the example:
393
ROLLBACK
For example, to remove all the data in the sales_order table, type the
following:
TRUNCATE TABLE sales_order
A TRUNCATE TABLE statement does not fire triggers defined on the table.
394
CHAPTER 13
About this chapter Once each query is parsed, the optimizer analyzes it and decides on an
access plan that will compute the result using as few resources as possible.
This chapter describes the steps the optimizer goes through to optimize a
query. It documents the assumptions that underlie the design of the
optimizer, and discusses selectivity estimation, cost estimation, and the other
steps of optimization.
Although update, insert, and delete statements must also be optimized, the
focus of this chapter is on select queries. The optimization of these other
commands follows similar principles.
Contents Topic: page
Indexes 426
395
The role of the optimizer
The role of the optimizer is to devise an efficient way to execute SQL
statements. The optimizer expresses its chosen method in the form of an
access plan. The access plan describes which tables to scan, which index, if
any, to use for each table, which join strategy to use, and what order to read
the tables in. Often, a great number of plans exist that all accomplish the
same goal. Other variables may further enlarge the number of possible
access plans.
Cost based The optimizer begins selecting for the choices available using efficient, and
in some cases proprietary, algorithms. It bases its decisions on predictions of
the resources each query requires. The optimizer takes into account both the
cost of disk access operations and the estimated CPU cost of each operation.
Syntax independent Most commands may be expressed in many different ways using the SQL
language. These expressions are semantically equivalent in that they
accomplish the same task, but may differ substantially in syntax. With few
exceptions, the Adaptive Server Anywhere optimizer devises a suitable
access plan based only on the semantics of each statement.
Syntactic differences, although they may appear to be substantial, usually
have no effect. For example, differences in the order of predicates, tables,
and attributes in the query syntax have no affect on the choice of access plan.
Neither is the optimizer affected by whether or not a query contains a view.
A good plan, not The goal of the optimizer is to find a good access plan. Ideally, the optimizer
necessarily the best plan would identify the most efficient access plan possible, but this goal is often
impractical. Given a complicated query, a great number of possibilities may
exist.
However efficient the optimizer, analyzing each option takes time and
resources. The optimizer compares the cost of further optimization with the
cost of executing the best plan it has found so far. If a plan has been devised
that has a relatively low cost, the optimizer stops and allows execution of
that plan to proceed. Further optimization might consume more resources
than would execution of an access plan already found.
In the case of expensive and complicated queries, the optimizer works
longer. In the case of very expensive queries, it may run long enough to
cause a discernible delay.
396
Chapter 13. Query Optimization and Execution
☞ For more information about reading access plans, see “Reading access
plans” on page 451.
Optimizer estimates
The optimizer chooses a strategy for processing a statement based on
histograms that are stored in the database and heuristics (educated guesses).
Histograms, also called column statistics, store information about the
distribution of values in a column. In Adaptive Server Anywhere, a
histogram represents the data distribution for a column by dividing the
domain of the column into a set of consecutive value ranges (also called
buckets) and by remembering, for each value range (or bucket), the number
of rows in the table for which the column value falls in the bucket.
Adaptive Server Anywhere pays particular attention to single column values
397
that are represented in a large number of rows in the table. Significant single
value selectivities are maintained in singleton histogram buckets (for
example, buckets that encompass a single value in the column domain).
ASA tries to maintain a minimum number of singleton buckets in each
histogram, usually between 10 and 100 depending upon the size of the table.
Additionally, all single values with selectivities greater than 1% are kept as
singleton buckets. As a result, a histogram for a given column remembers
the top N single value selectivities for the column where the value of N is
dependent upon the size of the table and the number of single value
selectivities that are greater than 1%.
Once the minimum number of value ranges have been met, low-selectivity
frequencies are replaced by large-selectivity frequencies as they come along.
The histogram will only have more than the minimum number of singleton
value ranges after it has seen enough values with a selectivity of greater than
1%.
Given the histogram on a column, Adaptive Server Anywhere attempts to
estimate the number of rows satisfying a given query predicate on the
column by adding up the number of rows in all value ranges that overlap the
values satisfying the specified predicate. For value ranges in the histograms
that are partially contained in the query result set, Adaptive Server
Anywhere uses interpolation within the value range.
Adaptive Server Anywhere uses an implementation of histograms that
causes the histograms to be more refined as a by-product of query execution.
As queries are executed, Adaptive Server Anywhere compares the number of
rows estimated by the histograms for a given predicate with the number of
rows actually found to satisfy the predicate, and then adjusts the values in
the histogram to reduce the margin of error for subsequent optimizations.
For each table in a potential execution plan, the optimizer estimates the
number of rows that will form part of the results. The number of rows
depends on the size of the table and the restrictions in the WHERE clause or
the ON clause of the query.
In many cases, the optimizer uses more sophisticated heuristics. For
example, the optimizer uses default estimates only in cases where better
statistics are unavailable. As well, the optimizer makes use of indexes and
keys to improve its guess of the number of rows. The following are a few
single-column examples:
♦ Equating a column to a value: estimate one row when the column has a
unique index or is the primary key.
♦ A comparison of an indexed column to a constant: probe the index to
estimate the percentage of rows that satisfy the comparison.
398
Chapter 13. Query Optimization and Execution
♦ Equating a foreign key to a primary key (key join): use relative table sizes
in determining an estimate. For example, if a 5000 row table has a
foreign key to a 1000 row table, the optimizer guesses that there are five
foreign key rows for each primary key row.
☞ For more information about column statistics, see “SYSCOLSTAT
system table” [ASA SQL Reference, page 665].
☞ For information about obtaining the selectivities of predicates and the
distribution of column values, see:
♦ “sa_get_histogram system procedure” [ASA SQL Reference, page 759]
♦ “The Histogram utility” [ASA Database Administration Guide, page 490]
♦ “ESTIMATE function [Miscellaneous]” [ASA SQL Reference, page 137]
♦ “ESTIMATE_SOURCE function [Miscellaneous]” [ASA SQL Reference,
page 138]
Column statistics are stored permanently in the database in the system table
SYSCOLSTAT. Statistics are automatically updated if a significant amount
of data is changed using the INSERT, UPDATE or DELETE statements.
If you suspect that performance is suffering because your statistics
inaccurately reflect the current column values, you may want to execute the
statements DROP STATISTICS or CREATE STATISTICS. CREATE
STATISTICS deletes old statistics and creates new ones, while DROP
STATISTICS just deletes the existing statistics.
With more accurate statistics available to it, the optimizer can compute
better estimates, thus improving the performance of subsequent queries.
However, incorrect estimates are only a problem if they lead to poorly
optimized queries.
When you execute LOAD TABLE, statistics are created for the table.
However, when rows are inserted, deleted or updated in a table, the statistics
are not updated.
For small tables, a histogram does not significantly improve the optimizer’s
ability to choose an efficient plan. You can specify the minimum table size
for which histograms are created. The default is 1000 rows. However, when
a CREATE STATISTICS statement is executed, a histogram is created for
every table, regardless of the number of rows.
☞ For more information, see “MIN_TABLE_SIZE_FOR_HISTOGRAM
option [database]” [ASA Database Administration Guide, page 627].
399
☞ For more information about column statistics, see:
♦ “SYSCOLSTAT system table” [ASA SQL Reference, page 665]
♦ “DROP STATISTICS statement” [ASA SQL Reference, page 442]
♦ “CREATE STATISTICS statement” [ASA SQL Reference, page 377]
Queries often optimize differently at the second execution. For the above
type of constraint, Adaptive Server Anywhere learns from experience,
automatically allowing for columns that have an unusual distribution of
values. The database stores this information permanently unless you
explicitly delete it using the DROP STATISTICS command. Note that
subsequent queries with predicates over that column may cause the server to
recreate a histogram on the column.
Underlying assumptions
A number of assumptions underlie the design direction and philosophy of
the Adaptive Server Anywhere query optimizer. You can improve the quality
or performance of your own applications through an understanding of the
optimizer’s decisions. These assumptions provide a context in which you
may understand the information contained in the remaining sections.
400
Chapter 13. Query Optimization and Execution
401
is slow, you may want to execute DROP STATISTICS and/or CREATE
STATISTICS.
Often, Adaptive Server Anywhere can evaluate predicates with the aid of an
index. Using an index, the optimizer speeds access to data and reduces the
amount of information read. For example, when OPTIMIZATION_GOAL is
set to first-row, the Adaptive Server Anywhere optimizer will try to use
indexes to satisfy ORDER BY and GROUP BY clauses.
When the optimizer cannot find a suitable index, it resorts to a sequential
table scan, which can be expensive. An index can improve performance
dramatically when joining tables. Add indexes to tables or rewrite queries
wherever doing so facilitates the efficient processing of common requests.
402
Chapter 13. Query Optimization and Execution
Server Anywhere avoids creating unnecessary work tables and may more
easily identify a suitable index through which to access a table.
Non-correlated subqueries are subqueries that contain no explicit reference
to the table or tables contained in the rest higher-level portions of the query.
The following is an ordinary query that contains a non-correlated subquery.
It selects information about all the customers who did not place an order on
January 1, 2001.
Non-correlated subquery SELECT *
FROM customer c
WHERE c.id NOT IN
( SELECT o.cust_id
FROM sales_order o
WHERE o.order_date = ’2001-01-01’ )
One possible way to evaluate this query is to first read the sales_order table
and create a work table of all the customers who placed orders on January 1,
2001, then read the customer table and extract one row for each customer
listed in the work table.
However, Adaptive Server Anywhere avoids materializing results as work
tables. It also gives preference to plans that return the first few rows of a
result most quickly. Thus, the optimizer rewrites such queries using EXISTS
predicates. In this form, the subquery becomes correlated: the subquery
now contains an explicit reference to the id column of the customer table.
Correlated subquery SELECT *
FROM customer c
WHERE NOT EXISTS
( SELECT *
FROM sales_order o
WHERE o.order_date = ’2000-01-01’
AND o.cust_id = c.id )
c<seq> : o<key_so_customer>
This query is semantically equivalent to the one above, but when expressed
in this new syntax, two advantages become apparent.
1. The optimizer can choose to use either the index on the cust_id attribute
or the order_date attribute of the sales_order table. (However, in the
sample database, only the id and cust_id columns are indexed.)
403
help because customer identification numbers are unique in the customer
table.
☞ Further information on subquery caching is located in “Subquery and
function caching” on page 449.
404
Chapter 13. Query Optimization and Execution
Steps in optimization
The steps the Adaptive Server Anywhere optimizer follows in generating a
suitable access plan include the following.
1. The parser converts the SQL query into an internal representation. It may
rewrite the query, converting it to a syntactically different, but
semantically equivalent, form. For example, a subquery may be rewritten
as a join. These conversions make the statement easier to analyze.
2. Optimization proper commences just before execution. If you are using
cursors in your application, optimization commences when the cursor is
opened. Unlike many other commercial database systems, Adaptive
Server Anywhere optimizes each statement just before executing it.
3. The optimizer performs semantic optimization on the statement. It
rewrites each SQL statement whenever doing so leads to better, more
efficient access plans.
4. The optimizer performs join enumeration for each subquery.
5. The optimizer optimizes access order.
405
Because Adaptive Server Anywhere saves statistics each time it executes a
query, the optimizer can learn from the experience of executing previous
plans and can adjust its choices when appropriate.
Simple queries If a query is recognized as a simple query, a heuristic rather than cost-based
optimization is used—the optimizer decides whether to use and index scan
or sequential table scan, and builds and executes the access plan
immediately. Steps 4 and 5 are bypassed.
A simple query is a DYNAMIC SCROLL or NO SCROLL cursor that does
not contain any kind of subquery, more than one table, a proxy table, user
defined functions, NUMBER(*), UNION, aggregation, DISTINCT, GROUP
BY, or more than one predicate on a single column. Simple queries can
contain ORDER BY only as long as the WHERE clause contains conditions
on each primary key column.
406
Chapter 13. Query Optimization and Execution
Index scans
An index scan uses an index to determine which rows satisfy a search
condition. It reads only those pages that satisfy the condition. Indexes can
return rows in sorted order.
Index scans are displayed in the short and long plan as
correlation_name <index_name >, where correlation_name is the
correlation name specified in the FROM clause, or the table name if none
was specified; and index_name is the name of the index.
Indexes provide an efficient mechanism for reading a few rows from a large
table. However, index scans cause pages to be read from the database in
random order, which is more expensive than sequential reads. Index scans
may also reference the same table page multiple times if there are several
rows on the page that satisfy the search condition. If only a few pages are
matched by the index scan, it is likely that the pages will remain in cache,
and multiple access does not lead to extra I/O. However, if many pages are
matched by the search condition, they may not all fit in cache. This can lead
to the index scan reading the same page from disk multiple times.
The optimizer will tend to prefer index scans over sequential table scans if
the OPTIMIZATION_GOAL setting is first-row. This is because indexes
tend to return the first few rows of a query faster than table scans.
Indexes can also be used to satisfy an ordering requirement, either explicitly
defined in an ORDER BY clause, or implicitly needed for a GROUP BY or
DISTINCT clause. Ordered group-by and ordered distinct methods can
return initial rows faster than hash-based grouping and distinct, but they may
be slower at returning the entire result set.
The optimizer uses an index scan to satisfy a search condition if the search
condition is sargable, and if the optimizer’s estimate of the selectivity of the
search condition is sufficiently low for the index scan to be cheaper than a
sequential table scan.
An index scan can also evaluate non-sargable search conditions after rows
407
are fetched from the index. Evaluating conditions in the index scan is
slightly more efficient than evaluating them in a filter after the index scan.
☞ For more information about when Adaptive Server Anywhere can make
use of indexes, see “Predicate analysis” on page 436.
☞ For more information about optimization goals, see
“OPTIMIZATION_GOAL option [database]” [ASA Database Administration
Guide, page 632].
A sequential table scan reads all the rows in all the pages of a table in the
order in which they are stored in the database.
Sequential table scans are displayed in the short and long plan as
correlation_name <seq >, where correlation_name is the correlation name
specified in the FROM clause, or the table name if none was specified.
This type of scan is used when it is likely that a majority of table pages have
a row that match the query’s search condition or a suitable index is not
defined.
Although sequential table scans may read more pages than index scans, the
disk I/O can be substantially cheaper because the pages are read in
contiguous blocks from the disk (this performance improvement is best if the
database file is not fragmented on the disk). Sequential I/O minimizes
overhead due to disk head movement and rotational latency. For large tables,
sequential table scans also read groups of several pages at a time. This
further reduces the cost of sequential table scans relative to index scans.
Although sequential table scans may take less time than index scans that
match many rows, they also cannot exploit the cache as effectively as index
scans if the scan is executed many times. Since index scans are likely to
access fewer table pages, it is more likely that the pages will be available in
the cache, resulting in faster access. Because of this, it is much better to have
an index scan for table accesses that are repeated, such as the right hand side
of a nested loops join.
For isolation level 3, Adaptive Server Anywhere acquires a lock on each row
that is accessed—even if it does not satisfy the search condition. For this
level, sequential table scans acquire locks on all of the rows in the table,
while index scans only acquire locks on the rows that match the search
condition. This means that sequential table scans may substantially reduce
the throughput in multi-user environments. For this reason, the optimizer
prefers indexed access over sequential access at isolation level 3. Sequential
scans can efficiently evaluate simple comparison predicates between table
408
Chapter 13. Query Optimization and Execution
columns and constants during the scan. Other search conditions that refer
only to the table being scanned are evaluated after these simple comparisons,
and this approach is slightly more efficient that evaluating the conditions in a
filter after the sequential scan.
to this structure:
table1<seq>*JF ( arbitrary dfos... ( HTS JNB
table2<idx> ) )
where idx is an index that you can use to probe the join key values stored in
the hash table.
When there are intervening operators between the hash join and the scan, it
reduces the number of rows needed that need to be processed by other
operators.
This strategy is most useful when the index probes are highly selective, for
example, when the number of rows in the build side is small compared to the
cardinality of the index.
Note
If the build side of the hash join is large, it is more effective to do a regular
sequential scan.
The optimizer computers a threshold build size, similar to how it computes
the threshold for the has join alternate execution, beyond which the (HTS
JNB table<idx>) will be treated as a sequential scan (table<seq>)
during execution.
Note
The sequential strategy is used if the build side of the hash table has to spill
to disk.
409
multiple input/output operations at the same time. Each scan carried out in
parallel searches for rows matching a different key and ensures that these
pages are in memory when needed.
Note
Parallel index scans are not supported over compressed indexes and not at
isolation level 3.
IN list
The IN list algorithm is used in cases where an IN predicate can be satisfied
using an index. For example, in the following query, the optimizer
recognizes that it can access the employee table using its primary key index.
SELECT *
FROM employee
WHERE emp_id in (102, 105, 129)
In order to accomplish this, a join is built with a special IN list table on the
left hand side. Rows are fetched from the IN list and used to probe the
employee table. Multiple IN lists can be satisfied using the same index. If
the optimizer chooses not to use an index to satisfy the IN predicate (perhaps
because another index gives better performance), then the IN list appears as
a predicate in a filter.
Recursive table
A recursive table is a common table expression constructed as a result of a
WITH clause in a query. The WITH clause is used for recursive union
queries. Common table expressions are temporary views that are known
only within the scope of a single SELECT statement.
☞ For more information, see “Common Table Expressions” on page 307.
Join algorithms
Join algorithms are required when more than one table appears in the FROM
clause. You cannot specify which join algorithm is used—the choice is made
by the optimizer.
The nested loops join computes the join of its left and right sides by
completely reading the right hand side for each row of the left hand side.
(The syntactic order of tables in the query does not matter, because the
optimizer chooses the appropriate join order for each block in the request.)
410
Chapter 13. Query Optimization and Execution
The optimizer may choose nested loops join if the join condition does not
contain an equality condition, or if it is optimizing for first-row time.
Since a nested loops join reads the right hand side many times, it is very
sensitive to the cost of the right hand side. If the right hand side is an index
scan or a small table, then the right hand side can likely be computed using
cached pages from previous iterations. On the other hand, if the right hand
side is a sequential table scan or an index scan that matches many rows, then
the right hand side needs to be read from disk many times. Typically, a
nested loops join is less efficient than other join methods. However, nested
loops join can provide the first matching row quickly compared to join
methods that must compute their entire result before returning.
Nested loops join is the only join algorithm that can provide sensitive
semantics for queries containing joins. This means that sensitive cursors on
joins can only be executed with a nested loops join.
A semijoin fetches only the first matching row from the right hand side. It is
a more efficient version of the nested loops join, but can only be used when
an EXISTS, or sometimes a DISTINCT, keyword is used.
Similar to the nested loop join described above, the nested loops semijoin
algorithm joins each row of the left-hand side with the right-hand side using
a nested loops algorithm. As with nested loops join, the right-hand side may
be read many times, so for larger inputs an index scan is preferable.
However, nested loops semijoin differs from nested loop join in two
respects. First, semijoin only outputs values from the left-hand side; the
right hand side is used only for restricting which rows of the left-hand side
appear in the result. Second, the nested loops semijoin algorithm stops each
search of the right-hand side as soon as the first match is encountered.
Nested loops semijoin can be used as the join algorithm when join’s inputs
include table expressions from an existentially-quantified (IN, SOME, ANY,
EXISTS) nested query that has been rewritten as a join.
The nested block join (also called the block nested loops join) reads a block
of rows from the left hand side, and sorts the rows by the join attributes (the
columns used in the join conditions). The left hand child of a nested block
join is called a sorted block node. For each block of rows with equal join
attributes, the right hand side is scanned once. This algorithm improves on
the nested loops join if there are several rows on the left hand side that join
with each row of the right hand side.
411
A nested block join will be chosen by the optimizer if the left hand side has
many rows with the same values for join attributes and the right hand side
has an index that satisfies the search condition.
Every nested block join has a left child that is a sorted block node. The cost
shown for this node is the cost of reading and sorting the rows from the left
input.
The left hand input is read into memory in blocks. Changes to tables in the
left hand input may not be visible in the results. Because of this, a nested
block join cannot provide sensitive semantics.
Nested block joins locks rows on the left input before they are copied to
memory.
The hash join algorithm builds an in-memory hash table of the smaller of its
two inputs, and then reads the larger input and probes the in-memory hash
table to find matches, which are written to a work table. If the smaller input
does not fit into memory, the hash join operator partitions both inputs into
smaller work tables. These smaller work tables are processed recursively
until the smaller input fits into memory.
The hash join algorithm has the best performance if the smaller input fits
into memory, regardless of the size of the larger input. In general, the
optimizer will choose hash join if one of the inputs is expected to be
substantially smaller than the other.
If the hash join algorithm executes in an environment where there is not
enough cache memory to hold all the rows that have a particular value of the
join attributes, then it is not able to complete. In this case, the hash join
method discards the interim results and an indexed-based nested loops join is
used instead. All of the rows of the smaller table are read and used to probe
the work table to find matches. This indexed-based strategy is significantly
slower than other join methods, and the optimizer will avoid generating
access plans using a hash join if it detects that a low memory situation may
occur during query execution. When the nested loops strategy is needed due
to low memory, a performance counter is incremented. You can read this
monitor with the QueryLowMemoryStrategy database or connection
property, or in the “Query: Low Memory Strategies” counter in Windows
Performance Monitor.
Note: Windows Performance Monitor may not be available on Windows
CE, 95, 98, or Me.
☞ For more information, see QueryLowMemoryStrategy in
412
Chapter 13. Query Optimization and Execution
The hash semijoin variant of the hash join algorithm performs a semijoin
between the left-hand side and the right-hand side. As with nested loop
semijoin described above, the right-hand side is only used to determine
which rows from the left-hand side appear in the result. With hash semijoin
the right-hand side is read to form an in-memory hash table which is
subsequently probed by each row from the left-hand side. As soon as any
match is found, the left-hand row is output to the result and the match
process starts again for the next left-hand row. At least one equality join
condition must be present in order for hash semijoin to be considered by the
query optimizer. As with nested loop semijoin, hash semijoin will be utilized
in cases where the join’s inputs include table expressions from an
existentially-quantified (IN, SOME, ANY, EXISTS) nested query that has
been rewritten as a join. Hash semijoin will tend to outperform nested loop
semijoin when the join condition includes inequalities, or if a suitable index
does not exist to make indexed retrieval of the right-hand side sufficiently
inexpensive.
As with hash join, the hash semijoin algorithm may revert to a nested loops
semijoin strategy if there is insufficient cache memory to enable the
operation to complete. Should this occur, a performance counter is
incremented. You can read this monitor with the QueryLowMemoryStrategy
database or connection property, or in the Query: Low Memory Strategies
counter in Windows Performance Monitor.
Note: Windows Performance Monitor may not be available on Windows
CE, 95, 98, or Me.
☞ For more information, see QueryLowMemoryStrategy in
“Connection-level properties” [ASA Database Administration Guide, page 665].
413
determine which rows from the left-hand side appear in the result. With hash
antisemijoin the right-hand side is read to form an in-memory hash table
which is subsequently probed by each row from the left-hand side. Each
left-hand row is output only if it fails to match any row from the right-hand
side. As with nested loop antisemijoin, hash antisemijoin will be utilized in
cases where the join’s inputs include table expressions from a
universally-quantified (NOT IN, ALL, NOT EXISTS) nested query that has
been rewritten as an anti-join. Hash antisemijoin will tend to outperform
nested loop antisemijoin when the join condition includes inequalities, or if
a suitable index does not exist to make indexed retrieval of the right-hand
side sufficiently inexpensive.
As with hash join, the hash antisemijoin algorithm may revert to a nested
loops antisemijoin strategy if there is insufficient cache memory to enable
the operation to complete. Should this occur, a performance counter is
incremented. You can read this monitor with the QueryLowMemoryStrategy
database or connection property, or in the Query: Low Memory Strategies
counter in Windows Performance Monitor.
Note: Windows Performance Monitor may not be available on Windows
CE, 95, 98, or Me.
☞ For more information, see QueryLowMemoryStrategy in
“Connection-level properties” [ASA Database Administration Guide, page 665].
The merge join algorithm reads two inputs which are both ordered by the
join attributes. For each row of the left input, the algorithm reads all of the
matching rows of the right input by accessing the rows in sorted order.
If the inputs are not already ordered by the join attributes (perhaps because
of an earlier merge join or because an index was used to satisfy a search
condition), then the optimizer adds a sort to produce the correct row order.
This sort adds cost to the merge join.
One advantage of a merge join compared to a hash join is that the cost of
sorting can be amortized over several joins, provided that the merge joins are
over the same attributes. The optimizer will choose merge join over a hash
join if the sizes of the inputs are likely to be similar, or if it can amortize the
cost of the sort over several operations.
The recursive hash join is a variant of the hash join algorithm that is used in
recursive union queries.
414
Chapter 13. Query Optimization and Execution
☞ For more information, see “Hash Join algorithm” on page 412, and
“Recursive common table expressions” on page 316.
The recursive left outer hash join is a variant of the hash join algorithm used
in certain recursive union queries.
☞ For more information, see “Hash Join algorithm” on page 412, and
“Recursive common table expressions” on page 316.
The hash distinct algorithm reads its input, and builds an in-memory hash
table. If an input row is found in the hash table, it is ignored; otherwise it is
written to a work table. If the input does not completely fit into the
in-memory hash table, it is partitioned into smaller work tables, and
processed recursively.
The hash distinct algorithm works very well if the distinct rows fit into an
in-memory table, irrespective of the total number of rows in the input.
The hash distinct uses a work table, and as such can provide insensitive or
value sensitive semantics.
If the hash distinct algorithm executes in an environment where there is very
little cache memory available, then it will not be able to complete. In this
case, the hash distinct method discards its interim results, and the indexed
distinct algorithm is used instead. The optimizer avoids generating access
plans using the hash distinct algorithm if it detects that a low memory
situation may occur during query execution.
The hash distinct returns a row as soon as it finds one that has not previously
been returned. However, the results of a hash distinct must be fully
materialized before returning from the query. If necessary, the optimizer
adds a work table to the execution plan to ensure this.
Hash distinct locks the rows of its input.
415
Ordered Distinct algorithm
If the input is ordered by all the columns, then an ordered distinct can be
used. This algorithm reads each row and compares it to the previous row. If
it is the same, it is ignored; otherwise, it is output. The ordered distinct is
effective if rows are already ordered (perhaps because of an index or a merge
join); if the input is not ordered, the optimizer inserts a sort. No work table is
used by the ordered distinct itself, but one is used by any inserted sort.
The indexed distinct algorithm maintains a work table of the unique rows
from the input. As rows are read from the input, an index on the work table
is searched to find a previously seen duplicate of the input row. If one is
found, the input row is ignored. Otherwise, the input row is inserted into the
work table. The work table index is created on all the columns of the
SELECT list; in order to improve index performance, a hash expression is
included as the first expression. This hash expression is a computed value
embodying the values of all the columns in the SELECT list.
The indexed distinct method returns distinct rows as they are encountered.
This allows it to return the first few rows quickly compared to other
duplicate elimination methods. The indexed distinct algorithm only stores
two rows in memory at a time, and can work well in extremely low memory
situations. However, if the number of distinct rows is large, the execution
cost of the indexed distinct algorithm is typically worse than hash distinct.
The work table used to store distinct rows may not fit in cache, leading to
rereading work table pages many times in a random access pattern.
Since the indexed distinct method uses a work table, it cannot provide fully
sensitive semantics; however, it also does not provide fully insensitive
semantics, and another work table is required for insensitive cursors.
The indexed distinct method locks the rows of its input.
Grouping algorithms
Grouping algorithms compute a summary of their input. They are applicable
only if the query contains a GROUP BY clause, or if the query contains
aggregate functions (such as SELECT COUNT(*) FROM T).
☞ For more information, see “Hash Group By algorithm” on page 417,
“Ordered Group By algorithm” on page 418, “Indexed Group By algorithm”
on page 418, and “Single Group By algorithm” on page 418.
416
Chapter 13. Query Optimization and Execution
In some cases, values in the grouping columns of the input table are
clustered, so that similar values appear close together. For example, if a table
contains a column that is always set to the current date, all rows with a single
date are relatively close within the table. The Clustered Hash Group By
algorithm exploits this clustering.
The optimizer may use Clustered Hash Group By when grouping tables that
are significantly larger than the available memory. In particular, it is effective
when the HAVING predicate returns only a small proportion of rows.
The Clustered Hash Group By algorithm can lead to significant wasted work
on the part of the optimizer if it is chosen in an environment where data is
updated concurrently with query execution. Clustered Hash Group By is
therefore most appropriate for OLAP workloads characterized by occasional
batch-style updates and read-based queries. Set the
OPTIMIZATION_WORKLOAD option to OLAP to indicate to the
optimizer that it should include the Clustered Hash Group By algorithm in
417
the possibilities it investigates.
☞ For more information, see “OPTIMIZATION_WORKLOAD option
[database]” [ASA Database Administration Guide, page 632].
418
Chapter 13. Query Optimization and Execution
SETS.
This algorithm may be used when it is not possible to use the Ordered Group
By Sets algorithm, because of lack of an index.
Merge sort
The sort operator reads its input into memory, sorts it in memory, and then
outputs the sorted results. If the input does not completely fit into memory,
then several sorted runs are created and then merged together. Sort does not
return any rows until it has read all of the input rows. Sort locks its input
rows.
If the merge sort algorithm executes in an environment where there is very
little cache memory available, it may not be able to complete. In this case,
the merge sort orders the remainder of the input using an indexed-based sort
method. Input rows are read and inserted into a work table, and an index is
built on the ordering columns of the work table. In this case, rows are read
from the work table using a complex index scan. This indexed-based
strategy is significantly slower than other join methods. The optimizer
avoids generating access plans using a merge sort algorithm if it detects that
a low memory situation may occur during query execution. When the
index-based strategy is needed due to low memory, a performance counter is
incremented; you can read this monitor with the QueryLowMemoryStrategy
property, or in the “Query: Low Memory Strategies” counter in Windows
Performance Monitor.
Sort performance is affected by the size of the sort key, the row size, and the
total size of the input. For large rows, it may be cheaper to use a VALUES
SENSITIVE cursor. In that case, columns in the SELECT list are not copied
into the work tables used by the sort. While the sort does not write output
rows to a work table, the results of the wort must be materialized before
rows are returned to the application. If necessary, the optimizer adds a work
table to ensure this.
Union all
The union all algorithm reads rows from each of its inputs and outputs them,
419
regardless of duplicates. This algorithm is used to implement UNION and
UNION ALL clauses. In the UNION case, a duplicate elimination algorithm
is needed to remove any duplicates generated by the union all.
Recursive union
The recursive union algorithm is employed during the execution of recursive
union queries.
☞ For more information, see “Recursive common table expressions” on
page 316.
Sort Top N
The Sort Top N algorithm is used for queries that contain a TOP N clause
and an ORDER BY clause. It is an efficient algorithm for sorting only those
rows required in the result set.
☞ For more information, see “SELECT statement” [ASA SQL Reference,
page 575].
Miscellaneous algorithms
The following are additional methods that may be used in an access plan.
Here we are joining R to S and T. We will have read all of the rows of R
before reading any row from T, and we can immediately reject rows of T that
can not possibly join with R. This reduces the number of rows that must be
420
Chapter 13. Query Optimization and Execution
Lock algorithm
Lock indicates that there is a lock at a certain isolation level. For example, at
isolation level 1, a lock is maintained for only one row at a time. If you are
at isolation level 0, no lock is acquired, but the node will still be called Lock.
In this case, the lock node verifies that the row still exists.
☞ For more information, see “How locking works” on page 135.
Row limit algorithm
Row limits are set by the TOP n or FIRST clause of the SELECT statement.
☞ For more information, see “SELECT statement” [ASA SQL Reference,
page 575].
The Bloom filter or hash map is a data structure that represents the
distribution of values in a column or set of columns. It may be used in
queries that satisfy the following conditions:
♦ An operation in the query reads its entire input before returning a row to
later operations. For example, a join of two tables on a single column
requires that all the relevant rows be read to establish whether they meet
the criterion for the join.
♦ A later operation in the query refers to the rows in the result of the
operation. For example, a second join on the same column would use
only those rows that satisfy the first join.
Explode algorithm
The Explode algorithm is used during the execution of set operations such as
EXCEPT and INTERSECT. It is a feature of such operations that the
number of rows in the result set is explicitly related to the number of rows in
the two sets being operated on. The Explode algorithm ensures that the
number of rows in the result set is correct.
☞ For more information, see “Performing set operations on query results
with UNION, INTERSECT, and EXCEPT” on page 253.
421
Window functions algorithm
422
Chapter 13. Query Optimization and Execution
423
assigned page, then the row splits and the extra information is inserted on
another page.
This characteristic deserves special attention, especially since Adaptive
Server Anywhere allows no extra space when you insert the row. For
example, suppose you insert a large number of empty rows into a table, then
fill in the values, one column at a time, using update statements. The result
would be that almost every value in a single row will be stored on a separate
page. To retrieve all the values from one row, the engine may need to read
several disk pages. This simple operation would become extremely and
unnecessarily slow.
You should consider filling new rows with data at the time of insertion. Once
inserted, they then have sufficient room for the data you expect them to hold.
A database file never As you insert and delete rows from the database, Adaptive Server Anywhere
shrinks automatically reuses the space they occupy. Thus, Adaptive Server
Anywhere may insert a row into space formerly occupied by another row.
Adaptive Server Anywhere keeps a record of the amount of empty space on
each page. When you ask it to insert a new row, it first searches its record of
space on existing pages. If it finds enough space on an existing page, it
places the new row on that page, reorganizing the contents of the page if
necessary. If not, it starts a new page.
Over time, however, if you delete a number of rows and don’t insert new
rows small enough to use the empty space, the information in the database
may become sparse. You can reload the table, or use the REORGANIZE
TABLE statement to defragment the table.
☞ For more information, see “REORGANIZE TABLE statement” [ASA
SQL Reference, page 555].
424
Chapter 13. Query Optimization and Execution
within databases that have at least a 2K page size. Each table’s bitmap
reflects the position of each table page in the entire dbspace file. For
databases of 2K, 4K, or 8K pages, the server utilizes the bitmap to read large
blocks (64K) of table pages instead of single pages at a time, reducing the
total number of I/O operations to disk and hence improving performance.
Users cannot control the server’s criteria for bitmap creation or usage.
Note that bitmaps, also called page maps, are only available for databases
created in version 8.0 and higher. If a database is upgraded from an older
version, the server will not create a bitmap for database tables, even if they
meet its criteria. Bitmaps are not created for work tables or system tables.
Should you choose a larger page size, such as 4 kb, you may wish to
increase the size of the cache. Fewer large pages can fit into the same space.
For example, 1 Mb of memory can hold 1000 pages that are each 1 kb in
size, but only 250 pages that are 4 kb in size. How many pages is enough
depends entirely on your database and the nature of the queries your
application performs. You can conduct performance tests with various cache
sizes. If your cache cannot hold enough pages, performance suffers as
Adaptive Server Anywhere begins swapping frequently-used pages to disk.
Page sizes also affect indexes. By default, index pages have a hash size of 10
bytes: they store approximately the first 10 bytes of data for each index
entry. This allows for a fan-out of roughly 200 using 4K pages, meaning that
each index page holds 200 rows, or 40 000 rows with a two-level index.
Each new level of an index allows for a table 200 times larger. Page size can
significantly affect fan-out, in turn affecting the depth of index required for a
table. Large databases should have 4K pages.
Adaptive Server Anywhere attempts to fill pages as much as possible. Empty
space accumulates only when new objects are too large to fit empty space on
existing pages. Consequently, adjusting the page size may not significantly
affect the overall size of your database.
425
Indexes
Indexes can greatly improve the performance of searches on the indexed
column(s). However, indexes take up space within the database and slow
down insert, update, and delete operations. This section will help you to
determine when you should create an index and tell you how to achieve
maximum performance from your index.
There are many situations in which creating an index improves the
performance of a database. An index provides an ordering of the rows of a
table on the basis of the values in some or all of the columns. An index
allows Adaptive Server Anywhere to find rows quickly. It permits greater
concurrency by limiting the number of database pages accessed. An index
also affords Adaptive Server Anywhere a convenient means of enforcing a
uniqueness constraint on the rows in a table.
The Index Consultant is a tool that assists you in the selection of an
appropriate set of indexes for your database. For more information, see
“Index Consultant overview” on page 67.
426
Chapter 13. Query Optimization and Execution
427
Note: Windows Performance Monitor may not be available on Windows
CE, 95, 98, or Me.
In addition, the number of full compares is provided in the graphical plan
with statistics. For more information, see “Common statistics used in the
plan” on page 453.
☞ For more information on the FullCompare function, see “Database-level
properties” [ASA Database Administration Guide, page 682].
Index structure and index Indexes are organized in a number of levels, like a tree. The first page of an
fan-out index, called the root page, branches into one or more pages at the next level,
and each of those pages branch again, until the lowest level of the index is
reached. These lowest level index pages are called leaf pages. To locate a
specific row, an index with n levels requires n reads for index pages and one
read for the data page containing the actual row. In general, fewer than n
reads from disk are needed, since index pages that are used frequently tend
to be stored in cache.
The index fan-out is the number of index entries stored on a page. An index
with a higher fan-out may have fewer levels than an index with a lower
fan-out. Therefore, higher index fan-out generally means better index
performance.
You can see the number of levels in an index by using the sa_index_levels
system procedure.
☞ For more information, see “sa_index_levels system procedure” [ASA
SQL Reference, page 762].
Composite indexes
An index can contain one, two, or more columns. An index on two or more
columns is called a composite index. For example, the following statement
creates a two-column composite index:
CREATE INDEX name
ON employee (emp_lname, emp_fname)
A composite index is useful if the first column alone does not provide high
selectivity. For example, a composite index on emp_lname and emp_fname
is useful when many employees have the same last name. A composite index
on emp_id and emp_lname would not be useful because each employee has
a unique ID, so the column emp_lname does not provide any additional
selectivity.
Additional columns in an index can allow you to narrow down your search,
but having a two-column index is not the same as having two separate
428
Chapter 13. Query Optimization and Execution
Suppose you then want to search for the first name John. The only useful
index is the one containing the first name in the first column of the index.
The index organized by last name then first name is of no use because
someone with the first name John could appear anywhere in the index.
If you think it likely that you will need to look up people by first name only
or second name only, then you should consider creating both of these
indexes.
429
Alternatively, you could make two indexes, each containing only one of the
columns. Remember, however, that Adaptive Server Anywhere only uses
one index to access any one table while processing a single query. Even if
you know both names, it is likely that Adaptive Server Anywhere will need
to read extra rows, looking for those with the correct second name.
When you create an index using the CREATE INDEX command, as in the
example above, the columns appear in the order shown in your command.
Primary key indexes and The order of the columns in a primary key index is enforced to be the same
column order as the order in which the columns appear in the table’s definition, regardless
as to the ordering of the columns specified in the PRIMARY KEY constraint.
Moreover, Adaptive Server Anywhere enforces an additional constraint that
a table’s primary key columns must be at the beginning of each row. Hence
if a primary key is added to an existing table the server may rewrite the
entire table to ensure that the key columns are at the beginning of each row.
In situations where more than one column appears in a primary key, you
should consider the types of searches needed. If appropriate, switch the
order of the columns in the table definition so the most frequently
searched-for column appears first, or create separate indexes, as required, for
the other columns.
Composite indexes and By default, the columns of an index are sorted in ascending order, but they
ORDER BY can optionally be sorted in descending order by specifying DESC in the
CREATE INDEX statement.
Adaptive Server Anywhere can choose to use an index to optimize an
ORDER BY query as long as the ORDER BY clause contains only columns
included in that index. In addition, the columns in the index must be ordered
in exactly the same way, or in exactly the opposite way, as the ORDER BY
clause. For single-column indexes, the ordering is always such that it can be
optimized, but composite indexes require slightly more thought. The table
below shows the possibilities for a two-column index.
430
Chapter 13. Query Optimization and Execution
and
SELECT col1, col2, col3 from example
ORDER BY col1 DESC, col2 ASC, col3 DESC
The index is not used to optimize a query with any other pattern of ASC and
DESC in the ORDER BY clause. For example:
SELECT col1, col2, col3 from table1
ORDER BY col1 ASC, col2 ASC, col3 ASC
is not optimized.
431
♦ Reduce locks Indexes reduce the number of rows and pages that must
be locked during inserts, updates, and deletes. This reduction is a result
of the ordering that indexes impose on a table.
☞ For more information on indexes and locking, see “How locking
works” on page 135.
♦ Estimate selectivity Because an index is ordered, the optimizer can
estimate the percentage of values that satisfy a given query by scanning
the upper levels of the index. This action is called a partial index scan.
Types of index
Adaptive Server Anywhere supports two types of index, and automatically
chooses between them depending on the declared width of the indexed
columns. For a total column width that is less than 10 bytes, Adaptive Server
Anywhere uses a B-tree index that contains an order-preserving encoding, or
hash value, that represents the indexed data. Hash B-tree indexes are also
used when the index key length is longer than one-eighth of the page size for
the database or 256 bytes (whichever is lower). For data values whose
combined declared length is between these two bounds, Adaptive Server
Anywhere uses a compressed B-tree index that stores each key in a
compressed form.
Indexes can be stored as either clustered or unclustered. Clustered indexes
may assist performance, but only one index on a table can be clustered.
432
Chapter 13. Query Optimization and Execution
Adaptive Server Anywhere creates takes the same amount of space as the
original value. For example, the hash value for an integer is 4 bytes in size,
the same amount of space as required to store an integer. Because the hash
value is the same size, Adaptive Server Anywhere can use hash values with a
one-to-one correspondence to the actual value. Adaptive Server Anywhere
can always tell whether two values are equal, or which is greater by
comparing their hash values. However, it can retrieve the actual value only
by reading the entry from the corresponding table.
When you index a column containing larger data types, the hash value will
often be shorter than the size of the type. For example, if you index a
column of string values, the hash value used is at most 9 bytes in length.
Consequently, Adaptive Server Anywhere cannot always compare two
strings using only the hash values. If the hash values are equal, Adaptive
Server Anywhere must retrieve and compare the actual two values from the
table.
For example, suppose you index the titles of books, many of which are
similar. If you wish to search for a particular title, the index may identify
only a set of possible rows. In this case, Adaptive Server Anywhere must
retrieve each of the candidate rows and examine the full title.
Composite indexes An ordered sequence of columns is also called a composite index. However,
each index key in these indexes is at most a 9 byte hash value. Hence, the
hash value cannot necessarily identify the correct row uniquely. When two
hash values are equal, Adaptive Server Anywhere must retrieve and compare
the actual values.
The page size of the database can have a significant effect on the index
433
fan-out. The index fan-out approximately doubles as the page size doubles.
Each index lookup requires one page read for each of the levels of the index
plus one page read for the table page, and a single query can require several
thousand index lookups. A large fan-out often means that fewer index levels
are required, which can improve searches considerably. For this reason,
consider using a large page size, such as 4K, to improve index performance.
You may also want to consider using a larger page size when you wish to
index long string columns using compressed B-tree indexes, but the size
limit on smaller page sizes is preventing their creation.
434
Chapter 13. Query Optimization and Execution
435
Predicate analysis
A predicate is a conditional expression that, combined with the logical
operators AND and OR, makes up the set of conditions in a WHERE,
HAVING, or ON clause. In SQL, a predicate that evaluates to UNKNOWN
is interpreted as FALSE.
A predicate that can exploit an index to retrieve rows from a table is called
sargable. This name comes from the phrase search argument-able.
Predicates that involve comparisons of a column with constants, other
columns, or expressions may be sargable.
The predicate in the following statement is sargable. Adaptive Server
Anywhere can evaluate it efficiently using the primary index of the
employee table.
SELECT *
FROM employee
WHERE employee.emp_id = 123
employee<employee>
employee<seq>
Similarly, no index can assist in a search for all employees whose first name
ends in the letter “k”. Again, the only means of computing this result is to
examine each of the rows individually.
Functions In general, a predicate that has a function on the column name is not
sargable. For example, an index would not be used on the following query:
SELECT * from sales_order
WHERE year(order_date)=’2000’
You can sometimes rewrite a query to avoid using a function, thus making it
sargable. For example, you can rephrase the above query:
SELECT * from sales_order
WHERE order_date > ’1999-12-31’
AND order_date < ’2001-01-01’
A query that uses a function becomes sargable if you store the function
436
Chapter 13. Query Optimization and Execution
You can then add an index on the column order_year in the ordinary way:
CREATE INDEX idx_year
ON sales_order (order_year)
the server recognizes that there is an indexed column that holds that
information and uses that index to answer the query.
The domain of the computed column must be equivalent to the domain of
the COMPUTE expression in order for the column substitution to be made.
In the above example, if year(order_date) had returned a string instead
of an integer, the optimizer would not have substituted the computed column
for the expression, and consequently the index idx_year could not have been
used to retrieve the required rows.
☞ For more information about computed columns, see “Working with
computed columns” on page 49.
Examples In each of these examples, attributes x and y are each columns of a single
table. Attribute z is contained in a separate table. Assume that an index
exists for each of these attributes.
437
Sargable Non-sargable
x = 10 x <> 10
x IS NULL x IS NOT NULL
x > 25 x = 4 OR y = 5
x =z x =y
x IN (4, 5, 6) x NOT IN (4, 5, 6)
x LIKE ‘pat%’ x LIKE ‘%tern’
x = 20 - 2 x + 2 = 20
Sometimes it may not be obvious whether a predicate is sargable. In these
cases, you may be able to rewrite the predicate so it is sargable. For each
example, you could rewrite the predicate x LIKE ‘pat%’ using the fact that
“u” is the next letter in the alphabet after “t”: x >= ’pat’ and x < ’pau’. In
this form, an index on attribute x is helpful in locating values in the
restricted range. Fortunately, Adaptive Server Anywhere makes this
particular transformation for you automatically.
A sargable predicate used for indexed retrieval on a table is a matching
predicate. A WHERE clause can have a number of matching predicates.
Which is most suitable can depend on the join strategy. The optimizer
re-evaluates its choice of matching predicates when considering alternate
join strategies.
438
Chapter 13. Query Optimization and Execution
p<seq>
The database server actually executes the semantically equivalent query:
SELECT p.id, p.quantity
FROM product p
Similarly, the result of the following query contains the primary keys of both
tables so each row in the result must be distinct.
SELECT DISTINCT *
FROM sales_order o JOIN customer c
ON o.cust_id = c.id
WHERE c.state = ’NY’
Subquery unnesting
You may express statements as nested queries, given the convenient syntax
provided in the SQL language. However, rewriting nested queries as joins
often leads to more efficient execution and more effective optimization,
since Adaptive Server Anywhere can take better advantage of highly
selective conditions in a subquery’s WHERE clause.
Examples The subquery in the following example can match at most one row for each
row in the outer block. Because it can match at most one row, Adaptive
Server Anywhere recognizes that it can convert it to an inner join.
439
SELECT s.*
FROM sales_order_items s
WHERE EXISTS
( SELECT *
FROM product p
WHERE s.prod_id = p.id
AND p.id = 300 AND p.quantity > 300)
440
Chapter 13. Query Optimization and Execution
SELECT p.*
FROM product p, sales_order_items s
WHERE p.id = s.prod_id
AND s.id = 2001
AND s.line_id = 1
It is quite common for queries to restrict the result of a view so that only a
few of the records are returned. In cases where the view contains GROUP
BY or UNION, it would be preferable for the server to only compute the
result for the desired rows.
Example Suppose we have the view product_summary defined as
CREATE VIEW product_summary( product_id, num_orders, total_qty)
as
SELECT prod_id, count(*), sum( quantity )
FROM sales_order_items
GROUP BY prod_id
which returns, for each product ordered, a count of the number of orders that
include it, and the sum of the quantities ordered over all of the orders. Now
consider the following query over this view:
SELECT *
FROM product_summary
WHERE product_id = 300
which restricts the output to that for product id 300. The query and the query
from the view could be combined into one semantically equivalent SELECT
statement, namely:
SELECT prod_id, count(*), sum( quantity )
FROM sales_order_items
GROUP BY prod_id
HAVING prod_id = 300.
A naive execution plan for this query would involve computing the
aggregates for each product, and then restricting the result to only the single
row for product ID 300. However, the HAVING predicate on the product_id
column can be pushed into the query’s WHERE clause since it is a grouping
column, yielding
441
SELECT prod_id, count(*), sum( quantity )
FROM sales_order_items
WHERE prod_id = 300
GROUP BY prod_id
Join elimination
The join elimination rewrite optimization reduces the join degree of the
query by eliminating tables from the query when it is safe to do so.
Typically, this optimization is used when the query contains a primary
key-foreign key join, and only primary key columns from the primary table
are referenced in the query.
Example For example, the query
SELECT s.id, s.line_id, p.id
FROM sales_order_items s KEY JOIN product p
would be rewritten as
SELECT s.id, s.line_id, s.prod_id
FROM sales_order_items s
WHERE s.prod_id IS NOT NULL.
The second query is semantically equivalent to the first because any row
from the sales_order_items table that has a NULL foreign key to product
will not appear in the result.
The join elimination optimization can also apply to tables involved in outer
joins, although the conditions for which the optimization is valid are much
more complex. Under certain other conditions, tables involved in primary
key-primary key joins may also be candidates for elimination.
Users should be aware that when this optimization is used, the result of a
DESCRIBE can differ from the expected result due to the substitution of
columns. In addition, an UPDATE or DELETE WHERE CURRENT request
may fail if the update statement refers to one or more of the eliminated base
tables. To circumvent this problem, either ensure that additional columns
from the eliminated table are present in the query’s SELECT list (to avoid
the optimization in the first place), or update the necessary row(s) using a
separate statement.
442
Chapter 13. Query Optimization and Execution
is rewritten internally as
SELECT MIN( quantity )
FROM ( SELECT FIRST quantity
FROM sales_order_items
WHERE prod_id = 300 and quantity IS NOT NULL
ORDER BY prod_id ASC, quantity ASC ) as s(quantity)
IN-list optimization
443
converts the IN-list predicate into a nested-loop join. The following example
illustrates how the optimization works.
Suppose we have the query
SELECT *
FROM sales_order
WHERE sales_rep = 142 or sales_rep = 1596
that lists all of the orders for these two sales reps. This query is semantically
equivalent to
SELECT *
FROM sales_order
WHERE sales_rep IN (142, 1596)
LIKE optimizations
LIKE predicates involving patterns that are either literal constants or host
variables are very common. Depending on the pattern, the optimizer may
rewrite the LIKE predicate entirely, or augment it with additional conditions
that could be exploited to perform indexed retrieval on the corresponding
table.
Examples In each of the following examples, assume that the pattern in the LIKE
predicate is a literal constant or host variable, and X is a column in a base
table.
♦ X LIKE ’%’ is rewritten as X IS NOT NULL
♦ X LIKE ’abc’ is rewritten as X = ’abc’
♦ X LIKE ’abc%’ is augmented with the predicates X < ’abcZ’ and X
444
Chapter 13. Query Optimization and Execution
> = ’abc_’
where Z and _ represent the corresponding high values and low values for
the collating sequence of this database. If the database is configured to store
blank-padded strings, the second comparison operator is >, not >=, to
ensure correct semantics.
For the most part, the optimizer generates a left-deep processing tree for its
access plans. The only exception to this rule is the existence of a right-deep
nested outer join expression. The query execution engine’s algorithms for
computing LEFT or RIGHT OUTER JOINs require that preserved tables
must precede null-supplying tables in any join strategy. Consequently the
optimizer looks for opportunities to convert LEFT or RIGHT outer joins to
inner joins whenever possible, since inner joins are commutable and give the
optimizer greater degrees of freedom when performing join enumeration.
A LEFT or RIGHT OUTER JOIN can be converted to an inner join when a
null-intolerant predicate on the null-supplying table is present in the query’s
WHERE clause. Since this predicate is null-intolerant, any all-NULL row
that would be produced by the outer join will be eliminated from the result,
hence making the query semantically equivalent to an inner join.
Example For example, consider the query
SELECT *
FROM product p KEY LEFT OUTER JOIN sales_order_items s
WHERE s.quantity > 15
which is intended to list all products and their orders for larger quantities; the
LEFT OUTER JOIN is intended to ensure that all products are listed, even if
they have no orders. The problem with this query is that the predicate in the
WHERE clause will eliminate from the result any product with no orders,
because the predicate s.quantity > 15 will be interpreted as FALSE if
s.quantity is NULL. Hence the query is semantically equivalent to
SELECT *
FROM product p KEY JOIN sales_order_items s
WHERE s.quantity > 15
and it is this rewritten form of the query that the server will optimize.
In this example, the query is almost certainly written incorrectly; it should
probably read
SELECT *
FROM product p
KEY LEFT OUTER JOIN sales_order_items s
ON s.quantity > 15
445
so that the test of quantity is part of the outer join condition.
While it is rare for this optimization to apply to straightforward outer join
queries, it can often apply when a query refers to one or more views that are
written using outer joins. The query’s WHERE clause may include
conditions that restrict the output of the view such that all null-supplying
rows from one or more table expressions would be eliminated, hence making
this optimization applicable.
An efficient access strategy for virtually any query relies on the presence of
sargable conditions in the WHERE/ON/HAVING clauses. Indexed retrieval
is possible only by exploiting sargable conditions as matching predicates. In
addition, hash, merge, and block-nested loop joins can only be used when an
equijoin condition is present. For these reasons, Adaptive Server Anywhere
does detailed analysis of the search conditions in the original query text in
order to discover simplified or implied conditions that can be exploited by
the optimizer.
As a preprocessing step, several simplifications are made to predicates in the
original statement once view expansion and merging have taken place. For
example:
♦ X = X is rewritten as X IS NOT NULL if X is nullable; otherwise the
predicate is eliminated.
♦ ISNULL(X,X) is rewritten as simply X.
446
Chapter 13. Query Optimization and Execution
original expression for exploitable predicates that are implied by the original
search condition, and ANDs these inferred conditions to the query.
Complete normalization is also avoided if this would require duplication of
an expensive predicate (for example, a quantified subquery predicate).
However, the algorithm will merge IN-list predicates together whenever
feasible.
Once the search condition has either been completely normalized or the
exploitable conditions have been found, the optimizer performs transitivity
analysis to discover transitive equality conditions, primarily transitive join
conditions and conditions with a constant. In doing so the optimizer will
increase its degrees of freedom when performing join enumeration during its
cost-based optimization phase, since these transitive conditions may permit
additional alternative join orders.
Example Suppose the original query is
SELECT e.emp_lname, s.id, s.order_date
FROM sales_order s, employee e
WHERE (e.emp_id = s.sales_rep and
(s.sales_rep = 142 or s.sales_rep = 1596)
)
OR
( e.emp_id = s.sales_rep and s.cust_id = 667)
447
SELECT *
FROM customer
WHERE UPPER(lname) = ’SMITH’
is rewritten internally as
SELECT *
FROM customer
WHERE lname = ’SMITH’
448
Chapter 13. Query Optimization and Execution
449
☞ For more information about user-defined functions, see “CREATE
FUNCTION statement” [ASA SQL Reference, page 346].
450
Chapter 13. Query Optimization and Execution
Hash except EH
Hash group by GrByH
451
Name Short Plan / Short name
Hash filter HF
Hash intersect IH
Exists join JE
Hash join JH
Sorted block SrtBl
452
Chapter 13. Query Optimization and Execution
Merge join JM
Merge intersect IM
Row limit RL
Row replicate RR
Recursive union RU
Union all UA
453
Statistic Explanation
CacheReadTable Number of table pages that have been read from the
cache.
CacheReadIndLeaf Number of index leaf pages that have been read from
the cache.
DiskReadTable Number of table pages that have been read from disk.
DiskReadIndLeaf Number of index leaf pages that have been read from
disk.
454
Chapter 13. Query Optimization and Execution
Statistic Explanation
EstDiskReadTime Estimated time required for reading rows from the disk.
EstDiskWriteTime Estimated time required for writing rows to the disk.
ANSI update con- Controls the range of updates that are permitted (options
straints are OFF, CURSORS, and STRICT).
See “ANSI_UPDATE_CONSTRAINTS option [com-
patibility]” [ASA Database Administration Guide,
page 592]
455
Item Explanation
Locked tables List of all locked tables and their isolation levels.
Items in the plan related
to scans
Item Explanation
456
Chapter 13. Query Optimization and Execution
Item Explanation
457
Items in the plan related
to DISTINCT
Item Explanation
Text plans
There are two types of text plan: short and long. To choose a plan type in
Interactive SQL, open the Options dialog from the Tools menu, and click the
Plan tab. To use SQL functions to access the plan, see “Accessing the plan
with SQL functions” on page 467.
Colons separate join The following command contains two query blocks: the outer select
strategies statement from the sales_order and sales_order_items tables, and the
subquery that selects from the product table.
SELECT *
FROM sales_order AS o
KEY JOIN sales_order_items AS i
WHERE EXISTS
( SELECT *
FROM product p
WHERE p.id = 300 )
Colons separate join strategies. Plans always list the join strategy for the
main block first. Join strategies for other query blocks follow. The order of
458
Chapter 13. Query Optimization and Execution
join strategies for these other query blocks may not correspond to the order
in your statement nor to the order in which they execute.
Short text plan The short plan is useful when you want to compare plans quickly. It
provides the least amount of information of all the access plan formats, but it
provides it on a single line.
In the following example, the plan starts with the word SORT because the
ORDER BY clause causes the entire result set to be sorted. The customer
table is accessed by its primary key index, also called customer. An index
scan is used to satisfy the search condition because the column customer.id
is a primary key. The abbreviation JNL indicates that the optimizer is using a
nested loops join to process the query. Finally, the sales_order table is
accessed using the foreign key index ky_so_customer to find matching rows
in the customer table.
☞ For more information about code words used in the plan, see
“Abbreviations used in the plan” on page 451.
Long text plan The long plan provides a little more information than the short plan, and
provides information in a way that is easy to print and view without scrolling
down.
In the long plan output for the same query, the first line is Plan [ I/O
Estimate: 1 ]. The words Plan or Sub-plan indicate the start of a query
459
block (in this case, there is only one). The I/O estimates how many I/O are
required for the query (in this case, one). Again, the plan indicates that the
results are sorted, and that a nested loops join is the join algorithm to be
used. On the same line as the algorithm, there is either the word TRUE or
the search condition and selectivity estimate for the algorithm (in this case,
there is none). The WHERE condition is represented on the line starting
with FILTER, followed by the search condition, selectivity estimate for the
search condition, and source of the selectivity estimate.
☞ For more information about code words used in the plan, see
“Abbreviations used in the plan” on page 451.
Graphical plans
There are two types of graphical plan: the graphical plan, and the graphical
plan with statistics. To choose a plan type in Interactive SQL, open the
Options dialog from the Tools menu, and click the Plan tab. To access the
460
Chapter 13. Query Optimization and Execution
plan with SQL functions, see “Accessing the plan with SQL functions” on
page 467.
Once the graphical plan is displayed, you can configure the way it is
displayed by right-clicking the left pane and choosing Customize.
You can print the graphical plan for later reference. To print the plan,
right-click a node and choose Print.
To obtain context-sensitive help for each node in the graphical plan, select
the node, right-click it and choose Help. For example, right-click Nested
Loops Join and choose Help for information about the resources used by that
part of the execution. There is also pop-up information that is available by
hovering your cursor over each element in the graphical plan.
Graphical plan The graphical plan provides a great deal more information than the short or
long plans. You can choose to see either the graphical plan, or the graphical
plan with statistics. Both allow you to quickly view which parts of the plan
have been estimated as the most expensive. The graphical plan with
statistics, though more expensive to view, also provides the actual query
execution statistics as monitored by the server when the query is executed,
and permits direct comparison between the estimates used by the query
optimizer in constructing the access plan with the actual statistics monitored
during execution. Note, however, that the optimizer is often unable to
precisely estimate a query’s cost, so expect there to be differences. The
graphical plan is the default format for access plans.
The graphical plan is designed to provide some key information visually:
♦ Each operation displayed in the graphical plan is displayed in a container.
The container indicates whether the operation materializes data, whether
it is an index scan, whether it is a table scan, or whether it is some other
operation.
♦ The number of rows that an operation passes to the next operation in the
plan is indicated by the thickness of the line joining the operations. This
provides a visual indicator of the operations carried out on most data in
the query.
♦ The container for an operation that is particularly slow is given a red
border.
In general, execution of nodes on the thick lines and in subtrees with red
boxes are most likely to determine query performance.
Each of these display features is customizable.
Following is the same query that was used to describe the short and long text
plans, presented with the graphical plan. The diagram is in the form of a
461
tree, indicating that each node requests rows from the nodes beneath it. The
Lock node indicates that the result set is materialized, or that a row is
returned to an application. In this case, the sort requires that results are
materialized. At level 0, rows aren’t really locked: Adaptive Server
Anywhere just ensures that the row has not been deleted since it was read
from the base tables. At level 1, a row is locked only until the next row is
accessed. At levels 2 and 3, read locks are applied and held until COMMIT.
You can obtain detailed information about the nodes in the plan by clicking
the node in the graphical diagram. In this example, the nested loops join
node is selected. The information in the right pane pertains only to that
node. For example, the Predicate description is simply TRUE, indicating
that at this stage in the query execution no predicate is being applied (the
only predicate in the query (customer.id < 100) is applied in the customer
table query node).
462
Chapter 13. Query Optimization and Execution
Use the graphical plan with statistics when you are having performance
problems and if the estimated row count or run time differs from your
expectations. The graphical plan with statistics provides estimates and actual
statistics for you to compare. A large difference between actual and estimate
is a warning sign that the optimizer might not have sufficient information to
choose a good access plan.
The database options and other global settings that affect query execution
are displayed for the root operator only.
Following are some of the key statistics you can check in the graphical plan
with statistics, and some possible remedies:
♦ Selectivity statistics The selectivity of a predicate (conditional
expression) is the percentage of rows that satisfy the condition. The
estimated selectivity of predicates provides the information on which the
optimizer bases its cost estimates. If the selectivity estimates are poor, the
query optimizer may generate a poor access plan. For example, if the
optimizer mistakenly estimates a predicate to be highly selective (with,
say, a selectivity of 5%), but the actual selectivity is much lower (for
example, 50%), then it may choose an inappropriate plan. Selectivity
estimates are not precise, but a significantly large error does indicate a
possible problem.
If you determine that the selectivity information for a key part of your
query is inaccurate, you may be able to use CREATE STATISTICS to
generate a new set of statistics for the column in question. In rare cases,
you may wish to supply explicit selectivity estimates, although these
present potential maintenance problems.
☞ For more information about selectivity, see “Selectivity in the plan”
on page 465.
☞ For more information about creating statistics, see “CREATE
STATISTICS statement” [ASA SQL Reference, page 377].
☞ For more information about user estimates, see “Explicit selectivity
estimates” [ASA SQL Reference, page 31].
Indicators of poor selectivity occur in the following places:
• RowsReturned actuals and estimates RowsReturned is the number
of rows in the result set. The RowsReturned statistic appears in the
table for the root node at the top of the tree. A significant difference
between the estimated rows returned and the actual number returned is
a warning sign that the optimizer is working on poor selectivity
information.
• Predicate selectivity actuals and estimates Look for a subheading
of Predicate to see predicate selectivities. For information about
463
reading the predicate information, see “Selectivity in the plan” on
page 465.
If the predicate is over a base column for which there does not exist a
histogram, executing a CREATE STATISTICS statement to create a
histogram may correct the problem. If selectivity error remains a
problem then, as a last resort, you may wish to consider specifying a
user estimate of selectivity along with the predicate in the query text.
• Estimate source The source of selectivity estimates is also listed
under the Predicate subheadings in the statistics pane. For information
about reading the predicate information, see “Selectivity in the plan”
on page 465.
An estimate source of Guess indicates that the optimizer has no
selectivity information to use. If the estimate source is Index and the
selectivity estimate is incorrect, your problem may be that the index is
skewed: you may benefit from defragmenting the index with the
REORGANIZE TABLE statement.
For a complete list of the possible sources of selectivity estimates, see
“ESTIMATE_SOURCE function [Miscellaneous]” [ASA SQL Reference,
page 138].
♦ Cache reads and hits If the number of cache reads and cache hits are
exactly the same, then all the information needed to execute the query is
in cache—an excellent thing. When reads are greater than hits, it means
that the server is attempting to go to cache but failing, and that it must
read from disk. In some cases, such as hash joins, this is expected. In
other cases, such as nested loops joins, a poor cache-hit ratio may
indicate a performance problem, and you may benefit from increasing
your cache size.
☞ For more information about cache management, see “Increase the
cache size” on page 165.
♦ Lack of appropriate indexes It is often not obvious from query
execution plans whether an index would help provide better performance
or not. Some of the scan-based algorithms used in Adaptive Server
Anywhere provide excellent performance for many queries without using
indexes.
If you suspect that additional indexes may help provide better
performance for a query, use the Index Consultant. For more information,
see “Index Consultant overview” on page 67.
☞ For more information about indexes and performance, see “Top
performance tips” on page 165 and “Using indexes” on page 167.
♦ Data fragmentation problems The Runtime actual and estimated
values are provided in the root node statistics. Runtime measures the time
464
Chapter 13. Query Optimization and Execution
to execute the query. If the runtime is incorrect for a table scan or index
scan, you may improve performance by executing the REORGANIZE
TABLE statement.
☞ For more information, see “REORGANIZE TABLE statement” [ASA
SQL Reference, page 555] and “Examine file, table, and index
fragmentation” on page 175..
Following is an example of the graphical plan with statistics. Again, the
nested loops join node is selected. The statistics in the right pane indicate the
resources used by that part of the query.
☞ For more information about code words used in the plan, see
“Abbreviations used in the plan” on page 451.
Selectivity in the plan Following is an example of the Predicate showing selectivity of a search
condition. In this example, the Filter node is selected, and the statistics pane
shows the Predicate as the search condition and selectivity statistics.
The access plan depends on the statistics available in the database, which
depends in turn on what queries have previously been executed. You may
see different statistics and plans from those shown here.
465
This predicate description is
department.dept_name = ’Sales’ : 5% Guess; true 1/5 20%
466
Chapter 13. Query Optimization and Execution
Note: If you select the graphical plan, but not the graphical plan with
statistics, the final two statistics are not displayed.
♦ Long plan
♦ Graphical plan
You can access the plan using SQL functions, and retrieve the output in
XML format.
♦ To access the short plan, see the “EXPLANATION function
[Miscellaneous]” [ASA SQL Reference, page 144].
467
♦ To access the long plan, see the “PLAN function [Miscellaneous]” [ASA
SQL Reference, page 188].
468
PART III
Transact-SQL Compatibility
About this chapter Transact-SQL is the dialect of SQL supported by Sybase Adaptive Server
Enterprise.
This chapter is a guide for creating applications that are compatible with
both Adaptive Server Anywhere and Adaptive Server Enterprise. It
describes Adaptive Server Anywhere support for Transact-SQL language
elements and statements, and for Adaptive Server Enterprise system tables,
views, and procedures.
Contents Topic: page
471
An overview of Transact-SQL support
Adaptive Server Anywhere supports a large subset of Transact-SQL, which
is the dialect of SQL supported by Sybase Adaptive Server Enterprise. This
chapter describes compatibility of SQL between Adaptive Server Anywhere
and Adaptive Server Enterprise.
Goals The goals of Transact-SQL support in Adaptive Server Anywhere are as
follows:
♦ Application portability Many applications, stored procedures, and
batch files can be written for use with both Adaptive Server Enterprise
and Adaptive Server Anywhere databases.
472
Chapter 14. Transact-SQL Compatibility
Statements allowed
ASA-only statements Transact-SQL statements
in both servers
Transact-SQL control
ASA control statements,
statements, CREATE
CREATE PROCEDURE SELECT, INSERT,
PROCEDURE statement,
statement, CREATE UPDATE, DELETE,...
CREATE TRIGGER
TRIGGER statement,...
statement,...
Similarities and Adaptive Server Anywhere supports a very high percentage of Transact-SQL
differences language elements, functions, and statements for working with existing data.
For example, Adaptive Server Anywhere supports all of the numeric
functions, all but one of the string functions, all aggregate functions, and all
date and time functions. As another example, Adaptive Server Anywhere
supports Transact-SQL outer joins (using =* and *= operators) and extended
DELETE and UPDATE statements using joins.
Further, Adaptive Server Anywhere supports a very high percentage of the
Transact-SQL stored procedure language (CREATE PROCEDURE and
CREATE TRIGGER syntax, control statements, and so on) and many, but
not all, aspects of Transact-SQL data definition language statements.
There are design differences in the architectural and configuration facilities
supported by each product. Device management, user management, and
maintenance tasks such as backups tend to be system-specific. Even here,
Adaptive Server Anywhere provides Transact-SQL system tables as views,
where the tables that are not meaningful in Adaptive Server Anywhere have
no rows. Also, Adaptive Server Anywhere provides a set of system
procedures for some of the more common administrative tasks.
This chapter looks first at some system-level issues where differences are
most noticeable, before discussing data manipulation and data definition
language aspects of the dialects where compatibility is high.
Transact-SQL only Some SQL statements supported by Adaptive Server Anywhere are part of
one dialect, but not the other. You cannot mix the two dialects within a
procedure, trigger, or batch. For example, Adaptive Server Anywhere
supports the following statements, but as part of the Transact-SQL dialect
473
only:
♦ Transact-SQL control statements IF and WHILE
Adaptive Server Adaptive Server Enterprise does not support the following statements:
Anywhere only
♦ control statements CASE, LOOP, and FOR
♦ Adaptive Server Anywhere versions of IF and WHILE
♦ CALL statement
♦ Adaptive Server Anywhere versions of the CREATE PROCEDURE,
CREATE FUNCTION, and CREATE TRIGGER statements
♦ SQL statements separated by semicolons
Notes The two dialects cannot be mixed within a procedure, trigger, or batch. This
means that:
♦ You can include Transact-SQL-only statements together with statements
that are part of both dialects in a batch, procedure, or trigger.
♦ You can include statements not supported by Adaptive Server Enterprise
together with statements that are supported by both servers in a batch,
procedure, or trigger.
♦ You cannot include Transact-SQL-only statements together with
Adaptive Server Anywhere-only statements in a batch, procedure, or
trigger.
474
Chapter 14. Transact-SQL Compatibility
475
has its own CREATE DATABASE and DROP DATABASE statements with
different syntax.
Device management
Adaptive Server Anywhere and Adaptive Server Enterprise use different
models for managing devices and disk space, reflecting the different uses for
the two products. While Adaptive Server Enterprise sets out a
comprehensive resource management scheme using a variety of
Transact-SQL statements, Adaptive Server Anywhere manages its own
resources automatically, and its databases are regular operating system files.
Adaptive Server Anywhere does not support Transact-SQL DISK
statements, such as DISK INIT, DISK MIRROR, DISK REFIT, DISK
REINIT, DISK REMIRROR, and DISK UNMIRROR.
☞ For information on disk management, see “Working with Database
Files” [ASA Database Administration Guide, page 257].
476
Chapter 14. Transact-SQL Compatibility
TABLE statement.
☞ For a description of the Adaptive Server Anywhere syntax for these
statements, see “SQL Statements” [ASA SQL Reference, page 241].
System tables
In addition to its own system tables, Adaptive Server Anywhere provides a
set of system views that mimic relevant parts of the Adaptive Server
Enterprise system tables. You’ll find a list and individual descriptions in
“Views for Transact-SQL compatibility” [ASA SQL Reference, page 743],
which describes the system catalogs of the two products. This section
provides a brief overview of the differences.
The Adaptive Server Anywhere system tables rest entirely within each
database, while the Adaptive Server Enterprise system tables rest partly
inside each database and partly in the master database. The Adaptive Server
Anywhere architecture does not include a master database.
In Adaptive Server Enterprise, the database owner (user ID dbo) owns the
system tables. In Adaptive Server Anywhere, the system owner (user ID
SYS) owns the system tables. A dbo user ID owns the Adaptive Server
Enterprise-compatible system views provided by Adaptive Server
Anywhere.
Administrative roles
Adaptive Server Enterprise has a more elaborate set of administrative roles
than Adaptive Server Anywhere. In Adaptive Server Enterprise there is a set
of distinct roles, although more than one login account on an Adaptive
Server Enterprise can be granted any role, and one account can possess more
than one role.
Adaptive Server In Adaptive Server Enterprise distinct roles include:
Enterprise roles
♦ System Administrator Responsible for general administrative tasks
unrelated to specific applications; can access any database object.
♦ System Security Officer Responsible for security-sensitive tasks in
Adaptive Server Enterprise, but has no special permissions on database
objects.
♦ Database Owner Has full permissions on objects inside the database he
or she owns, can add users to a database and grant other users the
permission to create objects and execute commands within the database.
♦ Data definition statements Permissions can be granted to users for
specific data definition statements, such as CREATE TABLE or CREATE
477
VIEW, enabling the user to create database objects.
♦ Object owner Each database object has an owner who may grant
permissions to other users to access the object. The owner of an object
automatically has all permissions on the object.
In Adaptive Server Anywhere, the following database-wide permissions
have administrative roles:
♦ The Database Administrator (DBA authority) has, like the Adaptive
Server Enterprise database owner, full permissions on all objects inside
the database (other than objects owned by SYS) and can grant other users
the permission to create objects and execute commands within the
database. The default database administrator is user ID DBA.
♦ The RESOURCE permission allows a user to create any kind of object
within a database. This is instead of the Adaptive Server Enterprise
scheme of granting permissions on individual CREATE statements.
♦ Adaptive Server Anywhere has object owners in the same way that
Adaptive Server Enterprise does. The owner of an object automatically
has all permissions on the object, including the right to grant permissions.
For seamless access to data held in both Adaptive Server Enterprise and
Adaptive Server Anywhere, you should create user IDs with appropriate
permissions in the database (RESOURCE in Adaptive Server Anywhere, or
permission on individual CREATE statements in Adaptive Server
Enterprise) and create objects from that user ID. If you use the same user ID
in each environment, object names and qualifiers can be identical in the two
databases, ensuring compatible access.
478
Chapter 14. Transact-SQL Compatibility
the two servers. For example, Adaptive Server Enterprise allows each user to
be a member of only one group, while Adaptive Server Anywhere has no
such restriction. You should compare the documentation on users and groups
in the two products for specific information.
Both Adaptive Server Enterprise and Adaptive Server Anywhere have a
public group, for defining default permissions. Every user automatically
becomes a member of the public group.
Adaptive Server Anywhere supports the following Adaptive Server
Enterprise system procedures for managing users and groups.
☞ For the arguments to each procedure, see “Adaptive Server Enterprise
system and catalog procedures” [ASA SQL Reference, page 807].
479
For example, the following statement is valid in both Adaptive Server
Enterprise and Adaptive Server Anywhere:
GRANT INSERT, DELETE
ON TITLES
TO MARY, SALES
480
Chapter 14. Transact-SQL Compatibility
481
❖ To create a Transact-SQL compatible database (SQL)
1. Connect to any Adaptive Server Anywhere database.
♦ If you are using the dbinit command line utility, specify the -b
command-line switch.
When you choose this option, Adaptive Server Enterprise and Adaptive
Server Anywhere considers the following two strings equal:
’ignore the trailing blanks ’
’ignore the trailing blanks’
If you do not choose this option, Adaptive Server Anywhere considers the
two strings above different.
A side effect of this choosing this option is that strings are padded with
blanks when fetched by a client application.
Remove historical Older versions of Adaptive Server Anywhere employed two system views
system views whose names conflict with the Adaptive Server Enterprise system views
provided for compatibility. These views include SYSCOLUMNS and
SYSINDEXES. If you are using Open Client or JDBC interfaces, create
482
Chapter 14. Transact-SQL Compatibility
your database excluding these views. You can do this with the dbinit -k
command-line switch.
If you do not use this option when creating your database, the following two
statements return different results:
SELECT * FROM SYSCOLUMNS ;
Caution
Ensure that you do not drop the dbo.syscolumns or dbo.sysindexes
system view.
Set the quoted_identifier By default, Adaptive Server Enterprise treats identifiers and strings
option differently than Adaptive Server Anywhere, which matches the SQL/92 ISO
standard.
483
The quoted_identifier option is available in both Adaptive Server
Enterprise and Adaptive Server Anywhere. Ensure the option is set to the
same value in both databases, for identifiers and strings to be treated in a
compatible manner.
For SQL/92 behavior, set the quoted_identifier option to ON in both
Adaptive Server Enterprise and Adaptive Server Anywhere.
For Transact-SQL behavior, set the quoted_identifier option to OFF in both
Adaptive Server Enterprise and Adaptive Server Anywhere. If you choose
this, you can no longer use identifiers that are the same as keywords,
enclosed in double quotes.
For more information on the quoted_identifier option, see
“QUOTED_IDENTIFIER option [compatibility]” [ASA Database
Administration Guide, page 639].
Set the automatic_ Transact-SQL defines a timestamp column with special properties. With the
timestamp option to ON automatic_timestamp option set to ON, the Adaptive Server Anywhere
treatment of timestamp columns is similar to Adaptive Server Enterprise
behavior.
With the automatic_timestamp option set to ON in Adaptive Server
Anywhere (the default setting is OFF), any new columns with the
TIMESTAMP data type that do not have an explicit default value defined
receive a default value of timestamp.
☞ For information on timestamp columns, see “The special Transact-SQL
timestamp column and data type” on page 486.
Set the string_rtruncation Both Adaptive Server Enterprise and Adaptive Server Anywhere support the
option string_rtruncation option, which affects error message reporting when an
INSERT or UPDATE string is truncated. Ensure that each database has the
option set to the same value.
☞ For more information on the STRING_RTRUNCATION option, see
“STRING_RTRUNCATION option [compatibility]” [ASA Database
Administration Guide, page 645].
Case sensitivity
Case sensitivity in databases refers to:
♦ Data The case sensitivity of the data is reflected in indexes and so on.
484
Chapter 14. Transact-SQL Compatibility
In Adaptive Server Enterprise, the case sensitivity of user IDs and passwords
follows the case sensitivity of the server.
485
In Adaptive Server Anywhere, indexes and triggers are owned by the owner
of the table on which they are created. Index and trigger names must be
unique for a given owner. For example, while the tables t1 owned by user
user1 and t2 owned by user user2 may have indexes of the same name, no
two tables owned by a single user may have an index of the same name.
Adaptive Server Enterprise has a less restrictive name space for index names
than Adaptive Server Anywhere. Index names must be unique on a given
table, but any two tables may have an index of the same name. For
compatible SQL, stay within the Adaptive Server Anywhere restriction of
unique index names for a given table owner.
Adaptive Server Enterprise has a more restrictive name space on trigger
names than Adaptive Server Anywhere. Trigger names must be unique in the
database. For compatible SQL, you should stay within the Adaptive Server
Enterprise restriction and make your trigger names unique in the database.
486
Chapter 14. Transact-SQL Compatibility
The data type of a Adaptive Server Enterprise treats a timestamp column as a domain that is
timestamp column VARBINARY(8), allowing NULL, while Adaptive Server Anywhere treats a
timestamp column as the TIMESTAMP data type, which consists of the date
and time, with fractions of a second held to six decimal places.
When fetching from the table for later updates, the variable into which the
timestamp value is fetched should correspond to the column description.
Timestamping an If you add a special timestamp column to an existing table, all existing rows
existing table have a NULL value in the timestamp column. To enter a timestamp value
(the current timestamp) for existing rows, update all rows in the table such
that the data does not change. For example, the following statement updates
all rows in the sales_order table, without changing the values in any of the
rows:
UPDATE sales_order
SET region = region
If all six digits are not shown, some timestamp column values may appear to
be equal: they are not.
Using tsequal for updates With the tsequal system function you can tell whether a timestamp column
has been updated or not.
For example, an application may SELECT a timestamp column into a
variable. When an UPDATE of one of the selected rows is submitted, it can
use the tsequal function to check whether the row has been modified. The
tsequal function compares the timestamp value in the table with the
timestamp value obtained in the SELECT. Identical timestamps means there
are no changes. If the timestamps differ, the row has been changed since the
SELECT was carried out.
487
A typical UPDATE statement using the tsequal function looks like this:
UPDATE publishers
SET city = ’Springfield’
WHERE pub_id = ’0736’
AND TSEQUAL(timestamp, ’1995/10/25 11:08:34.173226’)
The first argument to the tsequal function is the name of the special
timestamp column; the second argument is the timestamp retrieved in the
SELECT statement. In Embedded SQL, the second argument is likely to be
a host variable containing a TIMESTAMP value from a recent FETCH on
the column.
where n is large enough to hold the value of the maximum number of rows
that may be inserted into the table.
The IDENTITY column stores sequential numbers, such as invoice numbers
or employee numbers, which are automatically generated. The value of the
IDENTITY column uniquely identifies each row in a table.
In Adaptive Server Enterprise, each table in a database can have one
IDENTITY column. The data type must be numeric with scale zero, and the
IDENTITY column should not allow nulls.
In Adaptive Server Anywhere, the IDENTITY column is a column default
setting. You can explicitly insert values that are not part of the sequence into
the column with an INSERT statement. Adaptive Server Enterprise does not
allow INSERTs into identity columns unless the identity_insert option is on.
In Adaptive Server Anywhere, you need to set the NOT NULL property
yourself and ensure that only one column is an IDENTITY column.
Adaptive Server Anywhere allows any numeric data type to be an
IDENTITY column.
In Adaptive Server Anywhere the identity column and the
AUTOINCREMENT default setting for a column are identical.
488
Chapter 14. Transact-SQL Compatibility
The first time you insert a row into the table, an IDENTITY column has a
value of 1 assigned to it. On each subsequent insert, the value of the column
increases by one. The value most recently inserted into an identity column is
available in the @@identity global variable.
The value of @@identity changes each time a statement attempts to insert a
row into a table.
♦ If the statement affects a table without an IDENTITY column,
@@identity is set to 0.
♦ If the statement inserts multiple rows, @@identity reflects the last value
inserted into the IDENTITY column.
This change is permanent. @@identity does not revert to its previous value
if the statement fails or if the transaction that contains it is rolled back.
☞ For more information on the behavior of @@identity, see “@@identity
global variable” [ASA SQL Reference, page 46].
489
Writing compatible SQL statements
This section describes general guidelines for writing SQL for use on more
than one database-management system, and discusses compatibility issues
between Adaptive Server Enterprise and Adaptive Server Anywhere at the
SQL statement level.
♦ Assume large namespaces. For example, ensure that each index has a
unique name.
490
Chapter 14. Transact-SQL Compatibility
491
Syntax SELECT [ ALL | DISTINCT ] select-list
. . . [ INTO #temporary-table-name ]
. . . [ FROM table-spec [ HOLDLOCK | NOHOLDLOCK ],
. . . table-spec [ HOLDLOCK | NOHOLDLOCK ], . . . ]
. . . [ WHERE search-condition ]
. . . [ GROUP BY column-name, . . . ]
. . . [ HAVING search-condition ]
[ ORDER BY { expression | integer }
[ ASC | DESC ], . . . ]
Parameters select-list:
table-name.*
| *
| expression
| alias-name = expression
| expression as identifier
| expression as T_string
table-spec:
[ owner . ]table-name
. . . [ [ AS ] correlation-name ]
. . . [ ( INDEX index_name [ PREFETCH size ][ LRU | MRU ] ) ]
alias-name:identifier :
| ’string’ | "string"
♦ SHARED keyword
♦ COMPUTE clause
♦ FOR BROWSE clause
492
Chapter 14. Transact-SQL Compatibility
♦ The FOR READ ONLY clause and the FOR UPDATE clause are parsed,
but have no effect.
♦ The HOLDLOCK option applies only to the table or view for which it is
specified, and only for the duration of the transaction defined by the
statement in which it is used. Setting the isolation level to 3 applies a
holdlock for each select within a transaction. You cannot specify both a
HOLDLOCK and NOHOLDLOCK option in a query.
The variable name can optionally be set using the SET statement and the
Transact-SQL convention of an @ sign preceding the name:
SET @localvar = 42
♦ Adaptive Server Enterprise does not support the following clauses of the
SELECT statement syntax:
• INTO host-variable-list
• INTO variable-list.
• Parenthesized queries.
♦ Adaptive Server Enterprise uses join operators in the WHERE clause ,
rather than the FROM clause and the ON condition for joins.
493
Compatibility of joins
In Transact-SQL, joins appear in the WHERE clause, using the following
syntax:
start of select, update, insert, delete, or subquery
FROM { table-list | view-list } WHERE [ NOT ]
[ table-name.| view name.]column-name
join-operator
[ table-name.| view-name.]column_name
[ { AND | OR } [ NOT ]
[ table-name.| view-name.]column_name
join-operator
[ table-name.| view-name.]column-name ]. . .
end of select, update, insert, delete, or subquery
494
Chapter 14. Transact-SQL Compatibility
♦ Control statements
♦ Error handling
♦ User-defined functions
495
Row-level triggers are not part of the Transact-SQL compatibility features,
and are discussed in “Using Procedures, Triggers, and Batches” on page 645.
Description of Features of Transact-SQL triggers that are either unsupported or different in
unsupported or different Adaptive Server Anywhere include:
Transact-SQL triggers
♦ Triggers firing other triggers Suppose a trigger carries out an action
that would, if carried out directly by a user, fire another trigger. Adaptive
Server Anywhere and Adaptive Server Enterprise respond slightly
differently to this situation. By default in Adaptive Server Enterprise,
triggers fire other triggers up to a configurable nesting level, which has
the default value of 16. You can control the nesting level with the
Adaptive Server Enterprise nested triggers option. In Adaptive Server
Anywhere, triggers fire other triggers without limit unless there is
insufficient memory.
496
Chapter 14. Transact-SQL Compatibility
497
Automatic translation of stored procedures
In addition to supporting Transact-SQL alternative syntax, Adaptive Server
Anywhere provides aids for translating statements between the Watcom-SQL
and Transact-SQL dialects. Functions returning information about SQL
statements and enabling automatic translation of SQL statements include:
♦ SQLDialect(statement) Returns Watcom-SQL or Transact-SQL.
♦ WatcomSQL(statement) Returns the Watcom-SQL syntax for the
statement.
♦ TransactSQL(statement) Returns the Transact-SQL syntax for the
statement.
These are functions, and so can be accessed using a select statement from
Interactive SQL. For example:
select SqlDialect(’select * from employee’)
3. Right-click the procedure you want to translate and from the popup menu
choose one of the Translate to commands, depending on the dialect you
want to use.
The procedure appears in the right pane in the selected dialect. If the
selected dialect is not the one in which the procedure is stored, the server
translates it to that dialect. Any untranslated lines appear as comments.
498
Chapter 14. Transact-SQL Compatibility
Example of Watcom-SQL The following is the corresponding Adaptive Server Anywhere procedure:
procedure
CREATE PROCEDURE showdept(in deptname varchar(30))
RESULT ( lastname char(20), firstname char(20))
BEGIN
SELECT employee.emp_lname, employee.emp_fname
FROM department, employee
WHERE department.dept_name = deptname
AND department.dept_id = employee.dept_id
END
499
Variables in Transact-SQL procedures
Adaptive Server Anywhere uses the SET statement to assign values to
variables in a procedure. In Transact-SQL, values are assigned using either
the SELECT statement with an empty table-list, or the SET statement. The
following simple procedure illustrates how the Transact-SQL syntax works:
CREATE PROCEDURE multiply
@mult1 int,
@mult2 int,
@result int output
AS
SELECT @result = @mult1 * @mult2
500
Chapter 14. Transact-SQL Compatibility
The following table describes the built-in procedure return values and their
meanings:
Value Meaning
–1 Missing object
–2 Data type error
–4 Permission error
–5 Syntax error
501
Value Meaning
The RETURN statement can be used to return other integers, with their own
user-defined meanings.
502
Chapter 14. Transact-SQL Compatibility
503
CHAPTER 15
About this chapter Adaptive Server Anywhere complies completely with the SQL-92-based
United States Federal Information Processing Standard Publication (FIPS
PUB) 127.
Adaptive Server Anywhere is entry-level compliant with the ISO/ANSI
SQL-92 standard, and with minor exceptions is compliant with SQL-99 core
specifications.
Complete, detailed information about compliance is provided in the
reference documentation for each feature of Adaptive Server Anywhere.
This chapter describes those features of Adaptive Server Anywhere that are
not commonly found in other SQL implementations.
Contents Topic: page
505
Adaptive Server Anywhere SQL features
The following features of the SQL supported by Adaptive Server Anywhere
are not found in many other SQL implementations.
Type conversions Full type conversion is implemented. Any data type can be compared with
or used in any expression with any other data type.
Dates Adaptive Server Anywhere has date, time and timestamp types that includes
a year, month and day, hour, minutes, seconds and fraction of a second. For
insertions or updates to date fields, or comparisons with date fields, a free
format date is supported.
In addition, the following operations are allowed on dates:
♦ date + integer Add the specified number of days to a date.
♦ date - integer Subtract the specified number of days from a date.
♦ date - date Compute the number of days between two dates.
Also, many functions are provided for manipulating dates and times. See
“SQL Functions” [ASA SQL Reference, page 85] for a description of these.
Integrity Adaptive Server Anywhere supports both entity and referential integrity.
This has been implemented via the following two extensions to the
CREATE TABLE and ALTER TABLE commands.
PRIMARY KEY ( column-name, ... )
[NOT NULL] FOREIGN KEY [role-name]
[(column-name, ...)]
REFERENCES table-name [(column-name, ...)]
[ CHECK ON COMMIT ]
The PRIMARY KEY clause declares the primary key for the relation.
Adaptive Server Anywhere will then enforce the uniqueness of the primary
key, and ensure that no column in the primary key contains the NULL value.
The FOREIGN KEY clause defines a relationship between this table and
another table. This relationship is represented by a column (or columns) in
this table which must contain values in the primary key of another table. The
system will then ensure referential integrity for these columns - whenever
these columns are modified or a row is inserted into this table, these columns
will be checked to ensure that either one or more is NULL or the values
match the corresponding columns for some row in the primary key of the
other table. For more information, see “CREATE TABLE statement” [ASA
SQL Reference, page 385].
506
Chapter 15. Differences from Other SQL Dialects
Additional functions Adaptive Server Anywhere supports several functions not in the ANSI SQL
definition. See “SQL Functions” [ASA SQL Reference, page 85] for a full list of
available functions.
Cursors When using Embedded SQL, cursor positions can be moved arbitrarily on
the FETCH statement. Cursors can be moved forward or backward relative
to the current position or a given number of records from the beginning or
end of the cursor.
507
508
PART IV
About this chapter This chapter provides a summary of the XML support in Adaptive Server
Anywhere, including importing, exporting, storing, and querying XML data.
Contents Topic: page
511
What is XML?
Extensible Markup Language (XML) represents structured data in text
format. XML was designed specifically for use on the Web.
XML is a simple markup language, like HTML, but is also flexible, like
SGML. XML is hierarchical, and its main purpose is to describe the
structure of data for both humans and computer software to author and read.
Rather than providing a static set of elements which describe various forms
of data, XML lets you define elements. As a result, many types of structured
data can be described with XML. XML documents can optionally use a
document type definition (DTD) or XML schema to define the structure,
elements, and attributes that are used in an XML file.
☞ For more detailed information about XML, see
http://www.w3.org/XML/ .
512
Chapter 16. Using XML in the Database
so that the element content contains less than and greater than signs. If you
write a query that specifies that the element content is of type XML, as
follows:
SELECT XMLFOREST( CAST( ’<hat>bowler</hat>’ AS XML ) AS product
)
then the greater than and less than signs are not quoted and you get the
following result:
<product><hat>bowler</hat></product>
However, if the query does not specify that the element content is of type
XML, for example:
SELECT XMLFOREST( ’<hat>bowler</hat>’ AS product )
then the less than and greater than signs are quoted as follows:
<product><hat>bowler</hat></product>
Note that attribute content is always quoted, regardless of the data type.
☞ For more information about how element content is quoted, see “Invalid
column names” on page 525.
☞ For more information about the XML data type, see “XML data type
[Character]” [ASA SQL Reference, page 57].
513
Exporting relational data as XML
Adaptive Server Anywhere provides two ways to export your relational data
as XML: the Interactive SQL OUTPUT statement and the ADO.NET
DataSet object. Both the OUTPUT statement and the ADO.NET DataSet
object are used to save your relational data as XML.
The FOR XML clause and SQL/XML functions allow you to generate the
results as XML from the relational data in your database. You can then
export the generated XML to a file using the UNLOAD statement or the
xp_write_file system procedure.
514
Chapter 16. Using XML in the Database
515
The <inventory> element is the root node. You can refer to it using the
following XPath expression:
/inventory
Suppose that the current node is a <quantity> element. You can refer to this
node using the following XPath expression:
.
To find all the <product> elements that are children of the <inventory>
element, use the following XPath expression:
/inventory/product
If the current node is a <product> element and you want to refer to the size
attribute, use the following XPath expression:
./@size
516
Chapter 16. Using XML in the Database
quantity color
54 Orange
112 Black
☞ For more information, see “OPENXML function [String]” [ASA SQL
Reference, page 181].
Using OPENXML to OPENXML can be used to generate an edge table, a table that contains a
generate an edge table row for every element in the XML document. You may wish to generate an
edge table so that you can query the data in the result set using SQL.
The following SQL statement creates a variable, x, that contains an XML
document. The XML generated by the query has a root element called
<root>, which is generated using the XMLELEMENT function, and
elements are generated for each column in the employee, sales_order, and
customer tables using FOR XML AUTO with the ELEMENTS modifier
specified.
☞ For information about the XMLELEMENT function, see
“XMLELEMENT function [String]” [ASA SQL Reference, page 235].
☞ For information about FOR XML AUTO, see “Using FOR XML
AUTO” on page 528.
CREATE VARIABLE x XML;
SET x=(SELECT XMLELEMENT( NAME root,
(SELECT * FROM employee
KEY JOIN sales_order
KEY JOIN customer
FOR XML AUTO, ELEMENTS)));
517
<birth_date>1964-03-15</birth_date>
<bene_health_ins>Y</bene_health_ins>
<bene_life_ins>Y</bene_life_ins>
<bene_day_care>N</bene_day_care>
<sex>M</sex>
<sales_order>
<id>2001</id>
<cust_id>101</cust_id>
<order_date>2000-03-16</order_date>
<fin_code_id>r1</fin_code_id>
<region>Eastern</region>
<sales_rep>299</sales_rep>
<customer>
<id>101</id>
<fname>Michaels</fname>
<lname>Devlin</lname>
<address>114 Pioneer Avenue</address>
<city>Kingston</city>
<state>NJ</state>
<zip>07070</zip>
<phone>2015558966</phone>
<company_name>The Power Group</company_name>
</customer>
</sales_order>
</employee>
...
The result set generated by this query shows the id of each node, the id of
the parent node, and the name and content for each element in the XML
document.
518
Chapter 16. Using XML in the Database
23 15 emp_id 299
47 15 manager_id 902
74 15 emp_fname Rollin
... ... ... ...
Querying XML in a If you have a table with a column that contains XML, you can use
column OPENXML to query all the XML values in the column at once. This can be
done using a lateral derived table.
The following statements create a table with two columns, manager_id and
reports. The reports column contains XML data generated from the
employee table.
CREATE TABLE t (manager_id INT, reports XML);
INSERT INTO t
SELECT manager_id, XMLELEMENT( NAME reports,
XMLAGG(
XMLELEMENT( NAME e, emp_id)))
FROM employee
GROUP BY manager_id;
manager_id reports
<reports>
1293
<e>148</e>
<e>390</e>
<e>586</e>
<e>757</e>
...
</reports>
<reports>
1576
<e>184</e>
<e>207</e>
<e>318</e>
<e>409</e>
...
</reports>
519
manager_id reports
<reports>
902
<e>129</e>
<e>195</e>
<e>299</e>
<e>467</e>
...
</reports>
<reports>
703
<e>191</e>
<e>750</e>
<e>868</e>
<e>921</e>
...
</reports>
... ...
The following query uses a lateral derived table to generate a result set with
two columns: one that lists the id for each manager, and one that lists the id
for each employee that reports to that manager:
SELECT manager_id, eid
FROM t, LATERAL( OPENXML( t.reports, ’//e’ )
WITH (eid INT ’.’) ) dt
manager_id eid
1293 148
1293 390
1293 586
1293 757
... ...
☞ For more information about lateral derived tables, see “FROM clause”
[ASA SQL Reference, page 469].
520
Chapter 16. Using XML in the Database
521
Obtaining query results as XML
Adaptive Server Anywhere supports two different ways to obtain query
results from your relational data as XML:
♦ FOR XML clause The FOR XML clause can be used in a SELECT
statement to generate an XML document.
☞ For information about using the FOR XML clause, see “Using the
FOR XML clause to retrieve query results as XML” on page 523 and
“SELECT statement” [ASA SQL Reference, page 575].
♦ SQL/XML Adaptive Server Anywhere supports functions based on the
draft SQL/XML standard that generate XML documents from relational
data.
☞ For information about using one or more of these functions in a
query, see “Using SQL/XML to obtain query results as XML” on
page 542.
The FOR XML clause and the SQL/XML functions supported by Adaptive
Server Anywhere give you two alternatives for generating XML from your
relational data. In most cases, you can use either one to generate the same
XML.
For example, this query uses FOR XML AUTO to generate XML,
SELECT id, name
FROM product
WHERE color=’black’
FOR XML AUTO
522
Chapter 16. Using XML in the Database
The following sections describe the behavior of all three modes of the FOR
XML clause regarding binary data, NULL values, and invalid XML names,
as well as providing examples of how the FOR XML clause can be used.
When you use the FOR XML clause in a SELECT statement, regardless of
the mode used, any BINARY, LONG BINARY, IMAGE, and VARBINARY
523
columns are output as attributes or elements that are automatically
represented in base64-encoded format.
If you are using OPENXML to generate a result set from XML, OPENXML
assumes that the types BINARY, LONG BINARY, IMAGE, and
VARBINARY, are base64-encoded and decodes them automatically.
☞ For more information about OPENXML, see “OPENXML function
[String]” [ASA SQL Reference, page 181].
the company_name attribute is omitted from the result for Robert Michaels:
<row id="100" fname="Robert" lname="Michaels"/>
<row id="101" fname="Michaels" lname="Devlin"
company_name="The Power Group"/>’
524
Chapter 16. Using XML in the Database
Tip
When executing queries that contain a FOR XML clause in Interac-
tive SQL, you may wish to increase the column length by setting the
TRUNCATION_LENGTH option.
☞ For information about setting the truncation length, see
“TRUNCATION_LENGTH option [Interactive SQL]” [ASA Database
Administration Guide, page 651] and “Options dialog: Results tab” [SQL
Anywhere Studio Help, page 148].
The following examples show how the FOR XML clause can be used in a
SELECT statement.
525
♦ The following example shows how the FOR XML clause can be used in a
subquery:
SELECT XMLELEMENT(
NAME root,
(SELECT * FROM employee
FOR XML RAW))
♦ The following example shows how the FOR XML clause can be used in a
query with a GROUP BY clause and aggregate function:
SELECT name, AVG(unit_price) AS Price
FROM product
GROUP BY name
FOR XML RAW
♦ The following example shows how the FOR XML clause can be used in a
view definition:
CREATE VIEW emp_dept
AS SELECT emp_lname, emp_fname, dept_name
FROM employee JOIN department
ON employee.dept_id = department.dept_id
FOR XML AUTO
Parameters ELEMENTS tells FOR XML RAW to generate an XML element, instead of
an attribute, for each column in the result. If there are NULL values, the
element is omitted from the generated XML document. The following query
generates <emp_id> and <dept_name> elements:
SELECT employee.emp_id, department.dept_name
FROM employee JOIN department
ON employee.dept_id=department.dept_id
FOR XML RAW, ELEMENTS
526
Chapter 16. Using XML in the Database
<row>
<emp_id>160</emp_id>
<dept_name>R & D</dept_name>
</row>
<row>
<emp_id>243</emp_id>
<dept_name>R & D</dept_name>
</row>
...
The order of the results depend on the plan chosen by the optimizer, unless
you request otherwise. If you wish the results to appear in a particular order,
you must include an ORDER BY clause in the query, for example,
527
SELECT employee.emp_id, department.dept_name
FROM employee JOIN department
ON employee.dept_id=department.dept_id
ORDER BY emp_id
FOR XML RAW
Parameters ELEMENTS tells FOR XML AUTO to generate an XML element, instead
of an attribute, for each column in the result. For example,
SELECT employee.emp_id, department.dept_name
FROM employee JOIN department
ON employee.dept_id=department.dept_id
ORDER BY emp_id
FOR XML AUTO, ELEMENTS
In this case, each column in the result set is returned as a separate element,
rather than as an attribute of the <employee> element. If there are NULL
values, the element is omitted from the generated XML document.
528
Chapter 16. Using XML in the Database
<employee>
<emp_id>102</emp_id>
<department>
<dept_name>R & D</dept_name>
</department>
</employee>
<employee>
<emp_id>105</emp_id>
<department>
<dept_name>R & D</dept_name>
</department>
</employee>
<employee>
<emp_id>160</emp_id>
<department>
<dept_name>R & D</dept_name>
</department>
</employee>
...
Usage When you execute a query using FOR XML AUTO, data in BINARY,
LONG BINARY, IMAGE, and VARBINARY columns is automatically
returned in base64-encoded format. By default, NULL values are omitted
from the result. You can return NULL values as empty attributes by setting
the FOR_XML_NULL_TREATMENT option to EMPTY.
☞ For information about setting the FOR_XML_NULL_TREATMENT
option, see “FOR_XML_NULL_TREATMENT option [database]” [ASA
Database Administration Guide, page 612].
Unless otherwise requested, the database server returns the rows of a table in
an order that has no meaning. If you wish the results to appear in a particular
order, or for a parent element to have multiple children, you must include an
ORDER BY clause in the query so that all children are adjacent. If you do
not specify an ORDER BY clause, the nesting of the results depends on the
plan chosen by the optimizer and you may not get the nesting you desire.
FOR XML AUTO does not return a well-formed XML document because
the document does not have a single root node. If a <root> element is
required, one way to insert one is to use the XMLELEMENT function. For
example,
SELECT XMLELEMENT( NAME root,
(SELECT emp_id AS id, emp_fname AS name
FROM employee FOR XML AUTO ) )
529
by specifying aliases. The following query renames the id attribute to
product_id:
SELECT id AS product_id
FROM product
WHERE color=’black’
FOR XML AUTO
You can also rename the table with an alias. The following query renames
the table to product_info:
SELECT id AS product_id
FROM product AS product_info
WHERE color=’black’
FOR XML AUTO
Example The following query generates XML that contains both <employee> and
<department> elements, and the <employee> element (the table listed first
in the select list) is the parent of the <department> element.
SELECT employee.emp_id, department.dept_name
FROM employee JOIN department
ON employee.dept_id=department.dept_id
ORDER BY emp_id
FOR XML AUTO
530
Chapter 16. Using XML in the Database
<employee emp_id="160">
<department dept_name="R & D"/>
</employee>
<employee emp_id="243">
<department dept_name="R & D"/>
</employee>
...
If you change the order of the columns in the select list as follows:
SELECT department.dept_name, employee.emp_id
FROM employee JOIN department
ON employee.dept_id=department.dept_id
ORDER BY 1, 2
FOR XML AUTO
Again, the XML generated for the query contains both <employee> and
<department> elements, but in this case the <department> element is the
parent of the <employee> element.
531
Parameters In EXPLICIT mode, the first two columns in the SELECT statement must be
named Tag and Parent, respectively. Tag and Parent are metadata columns,
and their values are used to determine the parent-child relationship, or
nesting, of the elements in the XML document that is returned by the query.
♦ Tag column This is the first column specified in the select list. The Tag
column stores the tag number of the current element. Permitted values for
tag numbers are 1 to 255.
♦ Parent column This column stores the tag number for the parent of the
current element. If the value in this column is NULL, the row is placed at
the top level of the XML hierarchy.
For example, consider a query that returns the following result set when
FOR XML EXPLICIT is not specified. (The purpose of the first_name!1
and id!2 data columns is discussed in the following section, “Adding data
columns to the query” on page 532).
However, if the second row had the value 1 in the Parent column, the result
would look as follows:
<first_name>Beth
<id>102</id>
</first_name>
532
Chapter 16. Using XML in the Database
ElementName the name of the element. For a given row, the name of the
element generated for the row is taken from the ElementName field of the
first column with a matching tag number. If there are multiple columns with
the same TagNumber, the ElementName is ignored for subsequent columns
with the same TagNumber. In the example above, the first row generates an
element called <first_name>.
TagNumber the tag number of the element. For a row with a given tag
value, all columns with the same value in their TagNumber field will
contribute content to the element that corresponds to that row.
AttributeName specifies that the column value is an attribute of the
ElementName element. For example, if a data column had the name
prod_id!1!color, then color would appear as an attribute of the <prod_id>
element.
Directive this optional field allows you to control the format of the XML
document further. You can specify any one of the following values for
Directive :
♦ hide indicates that this column is ignored for the purpose of generating
the result. This directive can be used to include columns that are only
used to order the table. The attribute name is ignored and does not appear
in the result.
☞ For an example using the hide directive, see “Using the hide
directive” on page 539.
♦ xml indicates that the column value is inserted with no quoting. If the
AttributeName is specified, the value is inserted as an element with that
name. Otherwise, it is inserted with no wrapping element. If this
directive is not used, then markup characters are quoted unless the
column is of type XML. For example, the value <a/> would be inserted
as <a/>.
☞ For an example using the xml directive, see “Using the xml
directive” on page 540.
533
Usage Data in BINARY, LONG BINARY, IMAGE, and VARBINARY columns is
automatically returned in base64-encoded format when you execute a query
that contains FOR XML EXPLICIT. By default, any NULL values in the
result set are omitted. You can change this behavior by changing the setting
of the FOR_XML_NULL_TREATMENT option.
☞ For more information about the FOR_XML_NULL_TREATMENT
option, see “FOR_XML_NULL_TREATMENT option [database]” [ASA
Database Administration Guide, page 612] and “FOR XML and NULL values”
on page 524.
Writing an EXPLICIT Suppose you want to write a query using FOR XML EXPLICIT that
mode query generates the following XML document:
<employee emp_id=’129’>
<customer cust_id=’107’ region=’Eastern’/>
<customer cust_id=’119’ region=’Western’/>
<customer cust_id=’131’ region=’Eastern’/>
</employee>
<employee emp_id=’195’>
<customer cust_id=’109’ region=’Eastern’/>
<customer cust_id=’121’ region=’Central’/>
</employee>
You do this by writing a SELECT statement that returns the following result
set in the exact order specified, and then appending FOR XML EXPLICIT
to the query.
When you write your query, only some of the columns for a given row
become part of the generated XML document. A column is included in the
XML document only if the value in the TagNumber field (the second field in
the column name) matches the value in the Tag column.
In the example, the third column is used for the two rows that have the value
534
Chapter 16. Using XML in the Database
1 in their Tag column. In the fourth and fifth columns, the values are used
for the rows that have the value 2 in their Tag column. The element names
are taken from the first field in the column name. In this case, <employee>
and <customer> elements are created.
The attribute names come from the third field in the column name, so an
emp_id attribute is created for <employee> elements, while cust_id and
region attributes are generated for <customer> elements.
The following steps explain how to construct the FOR XML EXPLICIT
query that generates an XML document similar to the one found at the
beginning of this section using the sample database.
Note
If you are writing an EXPLICIT mode query that uses a UNION, then
only the column names specified in the first SELECT statement are
used. Column names that are to be used as element or attribute names
must be specified in the first SELECT statement because column names
specified in subsequent SELECT statements are ignored.
To generate the <employee> elements for the table above, your first
SELECT statement is as follows:
SELECT
1 AS tag,
NULL AS parent,
emp_id AS [employee!1!emp_id],
NULL AS [customer!2!cust_id],
NULL AS [customer!2!region]
FROM employee
535
SELECT
2,
1,
emp_id,
cust_id,
region
FROM employee KEY JOIN sales_order
3. Add a UNION ALL to the query to combine the two SELECT statements
together:
SELECT
1 AS tag,
NULL AS parent,
emp_id AS [employee!1!emp_id],
NULL AS [customer!2!cust_id],
NULL AS [customer!2!region]
FROM employee
UNION ALL
SELECT
2,
1,
emp_id,
cust_id,
region
FROM employee KEY JOIN sales_order
4. Add an ORDER BY clause to specify the order of the rows in the result.
The order of the rows is the order that is used in the resulting document.
SELECT
1 AS tag,
NULL AS parent,
emp_id AS [employee!1!emp_id],
NULL AS [customer!2!cust_id],
NULL AS [customer!2!region]
FROM employee
UNION ALL
SELECT
2,
1,
emp_id,
cust_id,
region
FROM employee KEY JOIN sales_order
ORDER BY 3, 1
FOR XML EXPLICIT
536
Chapter 16. Using XML in the Database
<order>, and <dept>. The <emp> element has id and name attributes, the
<order> element has a name attribute, and the <dept> element has a date
attribute.
SELECT
1 tag,
NULL parent,
emp_id [emp!1!id],
emp_fname [emp!1!name],
NULL [order!2!date],
NULL [dept!3!name]
FROM employee
UNION ALL
SELECT
2,
1,
emp_id,
NULL,
order_date,
NULL
FROM employee KEY JOIN sales_order
UNION ALL
SELECT
3,
1,
emp_id,
NULL,
NULL,
dept_name
FROM employee e JOIN department d
ON e.dept_id=d.dept_id
ORDER BY 3, 1
FOR XML EXPLICIT
537
Using the element If you wish to generate sub-elements rather than attributes, you can add the
directive element directive to the query, as follows:
SELECT
1 tag,
NULL parent,
emp_id [emp!1!id!element],
emp_fname [emp!1!name!element],
NULL [order!2!date!element],
NULL [dept!3!name!element]
FROM employee
UNION ALL
SELECT
2,
1,
emp_id,
NULL,
order_date,
NULL
FROM employee KEY JOIN sales_order
UNION ALL
SELECT
3,
1,
emp_id,
NULL,
NULL,
dept_name
FROM employee e JOIN department d
ON e.dept_id=d.dept_id
ORDER BY 3, 1
FOR XML EXPLICIT
538
Chapter 16. Using XML in the Database
<emp>
<id>129</id>
<name>Philip</name>
<order>
<date>2000-07-24</date>
</order>
<order>
<date>2000-07-13</date>
</order>
<order>
<date>2000-06-24</date>
</order>
...
<dept>
<name>Sales</name>
</dept>
</emp>
...
Using the hide directive In the following query, the employee ID is used to order the result, but the
employee ID does not appear in the result because the hide directive is
specified:
SELECT
1 tag,
NULL parent,
emp_id [emp!1!id!hide],
emp_fname [emp!1!name],
NULL [order!2!date],
NULL [dept!3!name]
FROM employee
UNION ALL
SELECT
2,
1,
emp_id,
NULL,
order_date,
NULL
FROM employee KEY JOIN sales_order
UNION ALL
SELECT
3,
1,
emp_id,
NULL,
NULL,
dept_name
FROM employee e JOIN department d
ON e.dept_id=d.dept_id
ORDER BY 3, 1
FOR XML EXPLICIT
539
This query returns the following result:
<emp name="Fran">
<dept name="R & D"/>
</emp>
<emp name="Matthew">
<dept name="R & D"/>
</emp>
<emp name="Philip">
<order date="2000-07-24"/>
<order date="2000-07-13"/>
<order date="2000-06-24"/>
<order date="2000-06-08"/>
...
<dept name="Sales"/>
</emp>
<emp name="Julie">
<dept name="Finance"/>
</emp>
...
Using the xml directive By default, when the result of a FOR XML EXPLICIT query contains
characters that are not valid XML name characters, the invalid are escaped
(for information see “Invalid column names” on page 525) unless the
column is of type XML. For example, the following query generates XML
that contains an ampersand (&):
SELECT
1 AS tag,
NULL AS parent,
id AS [customer!1!id!element],
company_name AS [customer!1!company_name]
FROM customer
WHERE id = ’115’
FOR XML EXPLICIT
In the result generated by this query, the ampersand is quoted because the
column is not of type XML:
<customer company_name="Sterling & Co.">
<id>115</id>
</customer>
The xml directive indicates that the column value is inserted into the
generated XML with no quoting. If you execute the same query as above
with the xml directive:
540
Chapter 16. Using XML in the Database
SELECT
1 AS tag,
NULL AS parent,
id AS [customer!1!id!element],
company_name AS [customer!1!company_name!xml]
FROM customer
WHERE id = ’115’
FOR XML EXPLICIT
The result produced by this query lists the description for each product in a
CDATA section. Data contained in the CDATA section is not quoted:
<product id="300">
<![CDATA[Tank Top]]>
</product>
<product id="301">
<![CDATA[V-neck]]>
</product>
<product id="302">
<![CDATA[Crew Neck]]>
</product>
<product id="400">
<![CDATA[Cotton Cap]]>
</product>
...
541
Using SQL/XML to obtain query results as XML
SQL/XML is a draft standard that describes a functional integration of XML
into the SQL language: it describes the ways that SQL can be used in
conjunction with XML. The supported functions allow you to write queries
that construct XML documents from relational data.
Invalid names and In SQL/XML, expressions that are not legal XML names, for example
SQL/XML expressions that include spaces, are quoted in the same manner as the FOR
XML clause. Element content of type XML is not quoted.
☞ For more information about quoting invalid expressions, see “Invalid
column names” on page 525.
☞ For information about using the XML data type, see “Storing XML
documents in relational databases” on page 513.
dept_list
<department dept_id="100">
<name>Breault</name>
<name>Cobb</name>
<name>Diaz</name>
<name>Driscoll</name>
...
</department>
542
Chapter 16. Using XML in the Database
dept_list
<department dept_id="200">
<name>Chao</name>
<name>Chin</name>
<name>Clark</name>
<name>Dill</name>
...
</department>
<department dept_id="300">
<name>Bigelow</name>
<name>Coe</name>
<name>Coleman</name>
<name>Davidson</name>
...
</department>
...
Employee_Name
<first_name>Fran</first_name>
<last_name>Whitney</last_name>
<first_name>Matthew</first_name>
<last_name>Cobb</last_name>
543
Employee_Name
<first_name>Philip</first_name>
<last_name>Chin</last_name>
<first_name>Julie</first_name>
<last_name>Jordan</last_name>
...
544
Chapter 16. Using XML in the Database
id results
301 <product_info>
<item_name>Tee Shirt
</item_name>
<quantity_left>54
</quantity_left>
<description>Medium Orange
Tee Shirt</description>
</product_info>
302 <product_info>
<item_name>Tee Shirt
</item_name>
<quantity_left>75
</quantity_left>
<description>One size fits
all Black Tee Shirt
</description>
</product_info>
400 <product_info>
<item_name>Baseball Cap
</item_name>
<quantity_left>112
</quantity_left>
<description>One size fits
all Black Baseball Cap
</description>
</product_info>
... ...
Specifying element The XMLELEMENT function allows you to specify the content of an
content element. The following statement produces an XML element with the
content hat.
SELECT id, XMLELEMENT( NAME product_type, ’hat’ )
FROM product
WHERE name IN ( ’Baseball Cap’, ’Visor’ )
Generating elements You can add attributes to the elements by including the
with attributes attribute-value-expression argument in your query. This argument specifies
the attribute name and content. The following statement produces an
attribute for the name, color, and unit_price of each item.
545
SELECT id, XMLELEMENT( NAME item_description,
XMLATTRIBUTES( name,
color,
unit_price )
) AS item_description_element
FROM product
WHERE id > 400
546
Chapter 16. Using XML in the Database
id product_info
401 <item_description>
<name>Baseball Cap</name>
<color>White</color>
<price>10.00</price>
</item_description>
500 <item_description>
<name>Visor</name>
<color>White</color>
<price>7.00</price>
</item_description>
501 <item_description>
<name>Visor</name>
<color>Black</color>
<price>7.00</price>
</item_description>
... ...
☞ For more information, see “XMLFOREST function [String]” [ASA SQL
Reference, page 236].
547
SELECT XMLGEN ( ’<order>
<id>{$id}</id>
<date>{$order_date}</date>
<customer>{$customer}</customer>
</order>’,
sales_order.id,
sales_order.order_date,
customer.company_name AS customer)
FROM sales_order JOIN customer
ON customer.id = sales_order.cust_id
order_info
<order>
<id>2131</id>
<date>2000-01-02</date>
<customer>BoSox Club</customer>
</order>
<order>
<id>2126</id>
<date>2000-01-03</date>
<customer>Leisure Time</customer>
</order>
<order>
<id>2065</id>
<date>2000-01-03</date>
<customer>Bloomfield's</custo
mer>
</order>
<order>
<id>2127</id>
<date>2000-01-06</date>
<customer>Creative Customs
Inc.</customer>
</order>
...
Generating attributes If you want the order ID number to appear as an attribute of the <order>
element, you would write query as follows (note that the variable reference
is contained in double quotes because it specifies an attribute value):
548
Chapter 16. Using XML in the Database
order_info
<order id="2131">
<date>2000-01-02</date>
<customer>BoSox Club</customer>
</order>
<order id="2126">
<date>2000-01-03</date>
<customer>Leisure Time</customer>
</order>
<order id="2065">
<date>2000-01-03</date>
<customer>Bloomfield's</custo
mer>
</order>
<order id="2127">
<date>2000-01-06</date>
<customer>Creative Customs
Inc.</customer>
</order>
...
549
Specifying header The FOR XML clause and the SQL/XML functions supported by Adaptive
information for XML Server Anywhere do not include header information in the XML documents
documents they generate. You can use the XMLGEN function to generate header
information.
SELECT XMLGEN( ’<?xml version="1.0"
encoding="ISO-8859-1" ?>
<r>{$x}</r>’,
(SELECT fname, lname FROM customer FOR XML RAW)
AS x )
550
PART V
This part describes how to load and unload your database, and how to access
remote data.
CHAPTER 17
About this chapter This chapter describes the Adaptive Server Anywhere tools and utilities that
help you achieve your importing and exporting goals, including SQL,
Interactive SQL, the dbunload utility, and Sybase Central wizards.
Contents Topic: page
Importing 560
Exporting 565
553
Introduction to import and export
Transferring large amounts of data into and from your database may be
necessary in several situations. For example,
♦ Importing an initial set of data into a new database.
♦ Exporting data from your database for use with other applications, such
as spreadsheets.
554
Chapter 17. Importing and Exporting Data
♦ If you are using the INPUT command, run Interactive SQL or the client
application on the same machine as the server. Loading data over the
network adds extra communication overhead. This might mean loading
new data during off hours.
♦ Place data files on a separate physical disk drive from the database. This
could avoid excessive disk head movement during the load.
♦ If you are using the INPUT command, start the server with the -b option
for bulk operations mode. In this mode, the server does not keep a
rollback log or a transaction log, it does not perform an automatic
COMMIT before data definition commands, and it does not lock any
records.
The server allows only one connection when you use the -b option.
Without a rollback log, you cannot use savepoints and aborting a
command always causes transactions to roll back. Without automatic
COMMIT, a ROLLBACK undoes everything since the last explicit
COMMIT.
Without a transaction log, there is no log of the changes. You should back
up the database before and after using bulk operations mode because, in
this mode, your database is not protected against media failure. For more
information, see “Backup and Data Recovery” [ASA Database
Administration Guide, page 343].
If you have data that requires many commits, running with the -b option
may slow database operation. At each COMMIT, the server carries out a
checkpoint; this frequent checkpointing can slow the server.
555
Importing and exporting data
You can import individual tables or portions of tables from other database
file formats, or from ASCII files. Depending on the format of the data you
are inserting, you have some flexibility about creating the table before the
import or during the import. You may find importing a useful tool if you
need to add large amounts of data to your database at a time.
You can export individual tables and query results in ASCII format, or in a
variety of formats supported by other database programs. You may find
exporting a useful tool if you need to share large portions of your database,
or extract portions of your database according to particular criteria.
Although Adaptive Server Anywhere import and export procedures work on
one table at a time, you can create scripts that effectively automate the
importing or export procedure, allowing you to import and export data into
or from a number of tables consecutively.
You can insert (append) data into tables, and you can replace data in tables.
In some cases, you can also create new tables at the same time as you import
the data. If you are trying to create a whole new database, however, consider
loading the data instead of importing it, for performance reasons.
You can export query results, table data, or table schema. If you are trying to
export a whole database, however, consider unloading the database instead
of exporting data, for performance reasons.
☞ For more information about loading and unloading complete databases,
see “Rebuilding databases” on page 572.
You can import and export files between Adaptive Server Anywhere and
Adaptive Server Enterprise using the BCP FORMAT clause.
☞ For more information, see “Adaptive Server Enterprise compatibility”
on page 589.
Data formats
Interactive SQL supports the following import and export file formats:
556
Chapter 17. Importing and Exporting Data
557
during importing. For example, the column data types may be different or in
a different order, or there may be extra values in the import data that do not
match columns in the destination table.
Rearranging the table or If you know that the structure of the data you want to import does not match
data the structure of the destination table, you have several options. You can
rearrange the columns in your table using the LOAD TABLE statement; you
can rearrange the import data to fit the table using a variation of the INSERT
statement and a global temporary table; or you can use the INPUT statement
to specify a specific set or order of columns.
Allowing columns to If the file you are importing contains data for a subset of the columns in a
contain NULLs table, or if the columns are in a different order, you can also use the LOAD
TABLE statement DEFAULTS option to fill in the blanks and merge
non-matching table structures.
If DEFAULTS is OFF, any column not present in the column list is assigned
NULL. If DEFAULTS is OFF and a non-nullable column is omitted from the
column list, the database server attempts to convert the empty string to the
column’s type. If DEFAULTS is ON and the column has a default value, that
value is used.
For example, to load two columns into the employee table, and set the
remaining column values to the default values if there are any, the LOAD
TABLE statement should look like this:
LOAD TABLE employee (emp_lname, emp_fname)
FROM ’new_employees.txt’
DEFAULTS ON
Merging different table You can rearrange the import data to fit the table using a variation of the
structures INSERT statement and a global temporary table.
2. Use the LOAD TABLE statement to load your data into the global
temporary table.
When you close the database connection, the data in the global temporary
table disappears. However, the table definition remains. You can use it
the next time you connect to the database.
3. Use the INSERT statement with a FROM SELECT clause to extract and
558
Chapter 17. Importing and Exporting Data
summarize data from the temporary table and put it into one or more
permanent database tables.
Outputting NULLs
Users often want to extract data for use in other software products. Since the
other software package may not understand NULL values, there are two
ways of specifying how NULL values are output. You can use either the
Interactive SQL NULLS option, or the IFNULL function. Both options
allow you to output a specific value in place of a NULL value.
Use the Interactive SQL NULLS option to set the default behavior, or to
change the output value for a particular session. Use the IFNULL function
to apply the output value to a particular instance or query.
Specifying how NULL values are output provides for greater compatibility
with other software packages.
559
Importing
Following is a summary of import tools, followed by instructions for
importing databases, data, and tables.
Import tools
There are a variety of tools available to help you import your data.
Interactive SQL Import You can access the import wizard by choosing Data ➤ Import from the
wizard Interactive SQL menu. The wizard provides an interface to allow you to
choose a file to import, a file format, and a destination table to place the data
in. You can choose to import this data into an existing table, or you can use
the wizard to create and configure a completely new table.
Choose the Interactive SQL Import wizard when you prefer using a
graphical interface to import data in a format other than text, or when you
want to create a table at the same time you import the data.
INPUT statement You execute the INPUT statement from the SQL Statements pane of the
Interactive SQL window. The INPUT statement allows you to import data in
a variety of file formats into one or more tables. You can choose a default
input format, or you can specify the file format on each INPUT statement.
Interactive SQL can execute a command file containing multiple INPUT
statements.
If a data file is in DBASE, DBASEII, DBASEIII, FOXPRO, or LOTUS
format and the table does not exist, it will be created. There are performance
impacts associated with importing large amounts of data with the INPUT
statement, since the INPUT statement writes everything to the Transaction
log.
Choose the Interactive SQL INPUT statement when you want to import data
into one or more tables, when you want to automate the import process using
a command file, or when you want to import data in a format other than text.
☞ For more information, see “INPUT statement [Interactive SQL]” [ASA
SQL Reference, page 501].
LOAD TABLE statement The LOAD TABLE statement allows you to import data only, into a table, in
an efficient manner in text/ASCII/FIXED formats. The table must exist and
have the same number of columns as the input file has fields, defined on
compatible data types. The LOAD TABLE statement imports with one row
per line, with values separated by a delimiter.
Use the LOAD TABLE statement when you want to import data in text
format. If you have a choice between using the INPUT statement or the
560
Chapter 17. Importing and Exporting Data
LOAD TABLE statement, choose the LOAD TABLE statement for better
performance.
☞ For more information, see “LOAD TABLE statement” [ASA SQL
Reference, page 516].
INSERT statement Since you include the data you want to place in your table directly in the
INSERT statement, it is considered interactive input. File formats are not an
issue. You can also use the INSERT statement with remote data access to
import data from another database rather than a file.
Choose the INSERT statement when you want to import small amounts of
data into a single table.
☞ For more information, see “INSERT statement” [ASA SQL Reference,
page 506].
Proxy Tables You can import data directly from another database. Using the Adaptive
Server Anywhere remote data access feature, you can create a proxy table,
which represents a table from the remote database, and then use an INSERT
statement with a SELECT clause to insert data from the remote database into
a permanent table in your database.
☞ For more information about remote data access, see “Accessing Remote
Data” on page 591.
Importing databases
You can use either the Interactive SQL Import wizard or the INPUT
statement to create a database by importing one table at a time. You can also
create a script that automates this process. However, for more efficient
results, consider reloading a database whenever possible.
☞ For more information about importing a database that was previously
unloaded, see “Reloading a database” on page 576.
Importing data
561
3. Specify how the database values are stored in the file you are importing.
4. Select the Use An Existing Table option and then enter the name and
location of the existing table. Click Next.
You can click the Browse button and locate the table you want to import
the data into.
5. For ASCII files, you can specify the field delimiter for the file, as well as
the escape character, whether trailing blanks should be included, and the
encoding for the file. The encoding option allows you to specify the code
page that is used to read the file to ensure that characters are treated
correctly. If you select (Default), the default encoding for the machine
Interactive SQL is running on is used.
☞ For information about supported code pages, see “Supported code
pages” [ASA Database Administration Guide, page 297].
6. Follow the remaining instructions in the wizard.
Importing appends the new data to the existing table. If the import is
successful, the Messages pane displays the amount of time it to took to
import the data. If the import is unsuccessful, a message appears
indicating the import was unsuccessful.
Where t1 is the name of the table you want to place the data in, and file1
is the name of the file that holds the data you want to import.
562
Chapter 17. Importing and Exporting Data
Importing a table
The LOAD TABLE statement appends the contents of the file to the
existing rows of the table; it does not replace the existing rows in the
table. You can use the TRUNCATE TABLE statement to remove all the
rows from a table.
563
Neither the TRUNCATE TABLE statement nor the LOAD TABLE
statement fires triggers, including referential integrity actions such as
cascaded deletes.
The LOAD TABLE statement has an optional STRIP clause. The default
setting (STRIP ON) strips trailing blanks from values before inserting
them. To keep trailing blanks, use the STRIP OFF clause in your LOAD
TABLE statement.
For more information about the LOAD TABLE statement syntax, see
“LOAD TABLE statement” [ASA SQL Reference, page 516].
564
Chapter 17. Importing and Exporting Data
Exporting
Following is a summary of export tools, followed by instructions for
exporting query results, databases, and tables.
Export tools
There are a variety of tools available to help you export your data.
Exporting data from You can export data from Interactive SQL by choosing Export from the Data
Interactive SQL menu. This allows you to choose the format of the exported query results.
OUTPUT statement You can export query results, tables or views from your database using the
Interactive SQL OUTPUT statement. The Interactive SQL OUTPUT
statement supports several different file formats. You can either specify the
default output format, or you can specify the file format on each OUTPUT
statement. Interactive SQL can execute a command file containing multiple
OUTPUT statements.
There are performance impacts associated with exporting large amounts of
data with the OUTPUT statement. As well, you should use the OUTPUT
statement on the same machine as the server if possible to avoid sending
large amounts of data across the network.
Choose the Interactive SQL OUTPUT statement when you want to export all
or part of a table or view in a format other than text, or when you want to
automate the export process using a command file.
☞ For more information, see “OUTPUT statement [Interactive SQL]” [ASA
SQL Reference, page 534].
UNLOAD TABLE You execute the UNLOAD TABLE statement from the SQL Statements
statement pane of the Interactive SQL window. It allows you to export data only, in an
efficient manner in text/ASCII/FIXED formats. The UNLOAD TABLE
statement exports with one row per line, and values separated by a comma
delimiter. The data exports in order by primary key values to make reloading
quicker.
Choose the UNLOAD TABLE statement when you want to export entire
tables in text format. If you have a choice between using the OUTPUT
statement, UNLOAD statement, or UNLOAD TABLE statement, choose the
UNLOAD TABLE statement for performance reasons.
☞ For more information, see “UNLOAD TABLE statement” [ASA SQL
Reference, page 626].
UNLOAD statement The UNLOAD statement is similar to the OUTPUT statement in that they
both export query results to a file. The UNLOAD statement, however, allows
565
you to export data in a more efficient manner and in text/ASCII/FIXED
formats only. The UNLOAD statement exports with one row per line, with
values separated by a comma delimiter.
To use the UNLOAD statement, the user must have ALTER or SELECT
permission on the table. For more information about controlling who can use
the UNLOAD statement, see “-gl server option” [ASA Database Administration
Guide, page 149].
Choose the UNLOAD statement when you want to export query results if
performance is an issue, and if output in text format is acceptable. The
UNLOAD statement is also a good choice when you want to embed an
export command in an application.
When unloading and reloading a database that has proxy tables, you must
create an external login to map the local user to the remote user, even if the
user has the same password on both the local and remote databases. If you
do not have an external login, the reload may fail because you cannot
connect to the remote server.
☞ For more information, see “UNLOAD statement” [ASA SQL Reference,
page 624].
Dbunload utility The dbunload utility and Sybase Central are graphically different, and
functionally equivalent. You can use either one interchangeably to produce
the same results. These tools are different from Interactive SQL statements
in that they can operate on several tables at once. And in addition to
exporting table data, both tools can also export table schema.
If you want to rearrange your tables in the database, you can use dbunload to
create the necessary command files and modify them as needed. Sybase
Central provides wizards and a GUI interface for unloading one, many, or all
of the tables in a database, and dbunload provides command line options for
the same activities. Tables can be unloaded with structure only, data only or
both structure and data. To unload fewer than all of the tables in a database,
a connection must be established beforehand.
You can also extract one or many tables with or without command files.
These files can be used to create identical tables in different databases.
Choose Sybase Central or the dbunload utility when you want to export in
text format, when you need to process large amounts of data quickly, when
your file format requirements are flexible, or when your database needs to be
rebuilt or extracted.
☞ For more information, see “Unloading a database using the dbunload
command-line utility” [ASA Database Administration Guide, page 548].
566
Chapter 17. Importing and Exporting Data
3. If you want to export query results and append the results to another file,
add the APPEND statement to the end of the OUTPUT statement.
567
♦ For example,
SELECT *
FROM employee;
OUTPUT TO ’c:\employee.dbf’ APPEND
4. If you want to export query results and include messages, add the
VERBOSE statement to the end of the OUTPUT statement.
For example,
SELECT *
FROM employee;
OUTPUT TO ’c:\employee.dbf’ VERBOSE
5. If you want to specify a format other than ASCII, add a FORMAT clause
to the end of the query.
For example,
SELECT *
FROM employee;
OUTPUT TO ’c:\employee.dbf’
FORMAT dbaseiii;
where c:\employee.dbf is the path, name, and extension of the new file
and dbaseiii is the file format for this file. You can enclose the string in
single or double quotation marks, but they are only required if the path
contains embedded spaces.
Where dbaseiii is the file format for this file. If you leave the FORMAT
option out, the file type defaults to ASCII.
568
Chapter 17. Importing and Exporting Data
Tips
You can combine the APPEND and VERBOSE statements to append
both results and messages to an existing file. For example, type OUTPUT
TO ’filename.sql’ APPEND VERBOSE. For more information about
APPEND and VERBOSE, see the “OUTPUT statement [Interactive SQL]”
[ASA SQL Reference, page 534].
If the export is successful, the Messages pane displays the amount of time
it to took to export the query result set, the filename and path of the
exported data, and the number of rows written. If the export is
unsuccessful, a message appears indicating the export was unsuccessful.
Exporting databases
569
❖ To unload all or part of a database (command line)
1. At a command prompt, enter the dbunload command and specify
connection parameters using the -c option.
For example, the following command unloads the entire database to
c:\temp:
dbunload -c "dbn=asademo;uid=DBA;pwd=SQL" c:\temp
For more information about additional command line options you can apply
to the dbunload utility, see “Unloading a database using the dbunload
command-line utility” [ASA Database Administration Guide, page 548].
Exporting tables
In addition to the methods described below, you can also export a table by
selecting all the data in a table and exporting the query results. For more
information, see “Exporting query results” on page 567.
Tip
You can export views just as you would export tables.
570
Chapter 17. Importing and Exporting Data
unloads the data from the sample database (assumed to be running on the
default database server with the default database name) into a set of files
in the c :\temp directory. A command file to rebuild the database from the
data files is created with the default name reload.SQL in the current
directory.
You can unload more than one table by separating the table names with a
comma (,) delimiter.
This statement unloads the department table from the sample database
into the file dept.txt in the server’s current working directory. If you are
running against a network server, the command unloads the data into a
file on the server machine, not the client machine. Also, the file name
passes to the server as a string. Using escape backslash characters in the
file name prevents misinterpretation if a directory of file name begins
with an n (\n is a newline character) or any other special characters.
Each row of the table is output on a single line of the output file, and no
column names are exported. The columns are separated, or delimited, by
a comma. The delimiter character can be changed using the DELIMITED
BY clause. The fields are not fixed-width fields. Only the characters in
each entry are exported, not the full width of the column.
☞ For more information about the UNLOAD TABLE statement syntax,
see “UNLOAD TABLE statement” [ASA SQL Reference, page 626].
571
Rebuilding databases
Rebuilding a database is a specific type of import and export involving
unloading and reloading your entire database. Rebuilding your database
takes all the information out of your database and puts it back in, in a
uniform fashion, thus filling space and improving performance much like
defragmenting your disk drive.
Note
It is good practice to make backups of your database before rebuilding.
Loading and unloading are most useful for improving performance,
reclaiming fragmented space, or upgrading your database to a newer version
of Adaptive Server Anywhere.
Rebuilding is different from exporting in that rebuilding exports and imports
table definitions and schema in addition to the data. The unload portion of
the rebuild process produces ASCII format data files and a ‘ reload.SQL ‘
file which contains table and other definitions. Running the reload.SQL
script recreates the tables and loads the data into them.
You can carry out this operation from Sybase Central or using the dbunload
command line utility.
When unloading and reloading a database that has proxy tables, you must
create an external login to map the local user to the remote user, even if the
user has the same password on both the local and remote databases. If you
do not have an external login, the reload may fail because you cannot
connect to the remote server.
☞ For more information about external logins, see “Working with external
logins” on page 602.
Consider rebuilding your database if you want to upgrade your database,
reclaim disk space or improve performance. You might consider extracting a
database (creating a new database from an old database) if you are using
SQL Remote or MobiLink.
If you need to defragment your database, and a full rebuild is not possible
due to requirements for continuous access to the database, consider
reorganizing the table instead of rebuilding.
☞ For more information about reorganizing tables, see the
“REORGANIZE TABLE statement” [ASA SQL Reference, page 555].
Rebuilding a database If a database is participating in replication, particular care needs to be taken
involved in replication if you wish to rebuild the database.
572
Chapter 17. Importing and Exporting Data
Replication is based on the offsets in the transaction log. When you rebuild a
database, the offsets in the old transaction log are different than the offsets in
the new log, making the old log unavailable For this reason, good backup
practices are especially important when participating in replication.
There are two ways of rebuilding a database involved in replication. The first
method uses the dbunload utility -ar option to make the unload and reload
occur in a way that does not interfere with replication. The second method is
a manual method of accomplishing the same task.
The rebuild (load/unload) and extract procedures are used to rebuild
databases and to create new databases from part of an old one.
With importing and exporting, the destination of the data is either into your
database or out of your database. Importing reads data into your database.
Exporting writes data out of your database. Often the information is either
coming from or going to another non-Adaptive Server Anywhere database.
Rebuilding, however, combines two functions: loading and unloading.
Loading and Unloading takes data and schema out of an Adaptive Anywhere
database and then places the data and schema back into an Adaptive Server
Anywhere database. The unloading procedure produces fixed format data
files and a reload.SQL file which contains table definitions required to
recreate the table exactly. Running the reload.SQL script recreates the tables
and loads the data back into them.
Rebuilding a database can be a time consuming operation, and can require a
large amount of disk space. As well, the database is unavailable for use while
being unloaded and reloaded. For these reasons, rebuilding a database is not
advised in a production environment unless you have a definite goal in mind.
Rebuild tools
LOAD/UNLOAD TABLE UNLOAD TABLE allows you to export data only, in an efficient manner in
statement text/ASCII/FIXED formats. The UNLOAD TABLE statement exports with
one row per line, with values separated by a comma delimiter. To make
reloading quicker, the data exports in order by primary key values.
To use the UNLOAD TABLE statement, the user must have ALTER or
573
SELECT permission on the table.
Choose the UNLOAD TABLE statement when you want to export data in
text format or when performance is an issue.
☞ For more information, see “UNLOAD statement” [ASA SQL Reference,
page 624].
dbunload/dbisql utilities The dbunload/dbisql utilities and Sybase Central are graphically different,
and Sybase Central and functionally equivalent. You can use either one interchangeably to
produce the same results.
You can use the Sybase Central Unload Database wizard or the dbunload
utility to unload an entire database in ASCII comma-delimited format and to
create the necessary Interactive SQL command files to completely recreate
your database. This may be useful for creating SQL Remote extractions or
building new copies of your database with the same or a slightly modified
structure. The dbunload utility and Sybase Central are useful for exporting
Adaptive Server Anywhere files intended for reuse within Adaptive Server
Anywhere.
Choose Sybase Central or the dbunload utility when you want to rebuild
your or extract from your database, export in text format, when you need to
process large amounts of data quickly, or when your file format requirements
are flexible. You can also use the Unload Database wizard to unload an
existing database into a new database.
☞ For more information, see “Rebuilding a database not involved in
replication” on page 576, “Rebuilding a database involved in replication” on
page 577, and “Unloading a database using the Unload Database wizard”
[ASA Database Administration Guide, page 548].
574
Chapter 17. Importing and Exporting Data
Instead, any empty pages are simply marked as free so they can be used
again. They are not removed from the database unless you rebuild it.
Rebuilding a database can reclaim disk space if you have deleted a large
amount of data from your database and do not anticipate adding more.
♦ Improve performance Rebuilding databases can improve performance
for the following reasons:
• If data on pages within the database is fragmented, unloading and
reloading can eliminate the fragmentation.
• Since the data can be unloaded and reloaded in order by primary keys,
access to related information can be faster, as related rows may appear
on the same or adjacent pages.
Upgrading a database New versions of the Adaptive Server Anywhere database server can be used
without upgrading your database. If you want to use features of the new
version that require access to new system tables or database options, you
must use the upgrade utility to upgrade your database. The upgrade utility
does not unload or reload any data.
If you want to use features of the new version that rely on changes in the
database file format, you must unload and reload your database. You should
back up your database after rebuilding the database.
To upgrade your database file, use the new version of Adaptive Server
Anywhere.
For more information about upgrading your database, see “Upgrading
Adaptive Server Anywhere” [What’s New in SQL Anywhere Studio, page 200].
You can unload more than one table by separating the table names with a
comma delimiter.
575
3. If you want to export data only, add the -d option.
For example, if you want to export data only, your final command would
look like this:
dbunload -c "dbn=asademo;uid=DBA;pwd=SQL" -d -t employee c:\
temp
Reloading a database
576
Chapter 17. Importing and Exporting Data
If you use one these options, no interim copy of the data is created on
disk, so you do not specify an unload directory on the command line.
This provides greater security for your data. The -ar and -an options
should also execute more quickly than Sybase Central, but -ac is slower.
2. Shut down the database and archive the transaction log, before using the
reloaded database.
Notes The -an and -ar options only apply to connections to a personal server, or
connections to a network server over shared memory.
There are additional options available for the dbunload utility that allow you
to tune the unload, as well as connection parameter options that allow you to
specify a running or non-running database and database parameters.
577
4. Shut down the new database. Perform validity checks that you would
usually perform after restoring a database.
5. Start the database using any production options you need. You can now
allow user access to the reloaded database.
Notes There are additional options available for the dbunload utility that allow you
to tune the unload, as well as connection parameter options that allow you to
specify a running or non-running database and database parameters.
If the above procedure does not meet your needs, you can manually adjust
the transaction log offsets. The following procedure describes how to carry
out that operation.
4. Rename the current transaction log file so that it is not modified during
the unload process, and place this file in the dbremote off-line logs
directory.
8. Use dblog on the new database with the ending offset noted in step 3 as
the -z parameter, and also set the relative offset to zero.
dblog -x 0 -z 137829 database-name.db
9. When you run the Message Agent, provide it with the location of the
original off-line directory on its command line.
10. Start the database. You can now allow user access to the reloaded
database.
578
Chapter 17. Importing and Exporting Data
5. Shut down the production server and make copies of the database and log.
6. Copy the rebuilt database onto the production server.
7. Run DBTRAN on the log from step 5.
This should be a relatively small file.
8. Start the server on the rebuilt database, but don’t allow users to connect.
9. Apply the transactions from step 8.
10. Allow users to connect.
579
Extracting data
Extracting removes a remote Adaptive Server Anywhere database from a
consolidated Adaptive Server Enterprise or Adaptive Server Anywhere
database.
You can use the Sybase Central Extract Database wizard or the Extraction
utility to extract databases. The Extraction utility is the recommended way of
creating and synchronizing remote databases from a consolidated database.
For more information about how to perform database extractions, see:
♦ “The Database Extraction utility” [SQL Remote User’s Guide, page 302]
♦ “Using the extraction utility” [SQL Remote User’s Guide, page 191]
580
Chapter 17. Importing and Exporting Data
6. Select the remote server you want to use to connect to the remote
database from which you want to migrate data, and then click Next.
If you have not already created a remote server, click Create Remote
Server Now to open the Remote Server Creation wizard.
☞ For more information about creating a remote server, see “Creating
remote servers” on page 596.
581
You can also create an external login for the remote server. By default,
Adaptive Server Anywhere uses the user ID and password of the current
user when it connects to a remote server on behalf of that user. However,
if the remote server does not have a user defined with the same user ID
and password as the current user, you must create an external login. The
external login assigns an alternate login name and password for the
current user so that user can connect to the remote server.
7. Select the tables that you want to migrate, and then click Next.
You cannot migrate system tables, so no system tables appear in this list.
8. Select the user that will own the tables on the target database, and then
click Next.
If you have not already created a user, click Create User Now to open the
User Creation wizard.
9. Select whether you want to migrate the data and/or the foreign keys from
the remote tables and whether you want to keep the proxy tables that are
created for the migration process, and then click Next.
If the target database is version 8.0.0 or earlier, the Migrate the foreign
keys option is not enabled. You must upgrade the database to version
8.0.1 or later to use this option.
☞ For more information about upgrading, see “Upgrading a database”
[What’s New in SQL Anywhere Studio, page 201].
582
Chapter 17. Importing and Exporting Data
This procedure calls several procedures in turn and migrates all the
remote tables belonging to the user l_smith using the specified criteria.
If you do not want all the migrated tables to be owned by the same user on
the target database, you must run the sa_migrate procedure for each owner
on the target database, specifying the local_table_owner and owner_name
arguments. It is recommended that you migrate tables associated with one
owner at a time.
☞ For more information, see “sa_migrate system procedure” [ASA SQL
Reference, page 766].
For target databases that are version 8.0.0 or earlier, foreign keys are
migrated automatically. If you do not want to migrate the foreign keys, you
must upgrade the database file format to version 8.0.1 or later.
☞ For more information about upgrading, see “Upgrading a database”
[What’s New in SQL Anywhere Studio, page 201].
583
Migrating individual
tables using the
sa_migrate stored ❖ To import remote tables (with modifications)
procedures
1. From Interactive SQL, connect to the target database.
2. Run the sa_migrate_create_remote_table_list stored procedure. For
example,
CALL sa_migrate_create_remote_table_list( ’ase’,
NULL, ’remote_a’, ’mydb’ )
You must specify a database name for Adaptive Server Enterprise and
Microsoft SQL Server databases.
This populates the dbo.migrate_remote_table_list table with a list of
remote tables to migrate. You can delete rows from this table for remote
tables you do not wish to migrate.
Do not supply NULL for both the table_name and owner_name
parameters. Doing so migrates all the tables in the database, including
system tables. As well, tables that have the same name but different
owners in the remote database all belong to one owner in the target
database. It is recommended that you migrate tables associated with one
owner at a time.
☞ For more information about the sa_migrate_create_remote_table_list
stored procedure, see “sa_migrate_create_remote_table_list system
procedure” [ASA SQL Reference, page 771].
3. Run the sa_migrate_create_tables stored procedure. For example,
CALL sa_migrate_create_tables( ’local_a’, )
4. If you want to migrate the data from the remote tables into the base tables
on the target database, run the sa_migrate_data stored procedure. For
example,
Enter the following stored procedure:
CALL sa_migrate_data( ’local_a’ )
584
Chapter 17. Importing and Exporting Data
This procedure migrates the data from each remote table into the base
table created by the sa_migrate_create_tables procedure.
☞ For more information about the sa_migrate_data stored procedure,
see “sa_migrate_data system procedure” [ASA SQL Reference, page 774].
If you do not want to migrate the foreign keys from the remote database,
you can skip to step 7.
5. Run the sa_migrate_create_remote_fks_list stored procedure. For
example,
CALL sa_migrate_create_remote_fks_list( ’ase’ )
7. If you want to drop the proxy tables that were created for migration
purposes, run the sa_drop_proxy_tables stored procedure. For example,
CALL sa_migrate_drop_proxy_tables( ’local_a’ )
This procedure drops all proxy tables created for migration purposes and
completes the migration process.
☞ For more information about the sa_migrate_drop_proxy_tables
stored procedure, see “sa_migrate_drop_proxy_tables system procedure”
[ASA SQL Reference, page 775].
585
Running SQL command files
This section describes how to process files consisting of a set of commands.
♦ You can load a command file into the SQL Statements pane and execute
it directly from there.
You load command files back into the SQL Statements pane by choosing
File ➤ Open. Enter temp.sql when prompted for the file name.
On Windows platforms you can make Interactive SQL the default editor
for .SQL files so that you can double-click the file and it appears in the
SQL Statements pane of Interactive SQL.
☞ For more information about making Interactive SQL the default
586
Chapter 17. Importing and Exporting Data
editor for .SQL files, see “Options dialog: General tab” [SQL Anywhere
Studio Help, page 146].
2. In the Save dialog, specify a location, name and format for the file. Click
Save when finished.
where c:\filename.sql is the path, name, and extension of the file. Single
quotation marks (as shown) are required only if the path contains spaces.
587
❖ To run a command file in batch mode
1. Supply a command file as a command-line argument for Interactive SQL.
For example, the following command runs the command file myscript.sql
against the sample database.
dbisql -c "dsn= ASA 9.0 Sample" myscript.sql
588
Chapter 17. Importing and Exporting Data
589
CHAPTER 18
About this chapter Adaptive Server Anywhere can access data located on different servers, both
Sybase and non-Sybase, as if the data were stored on the local server.
This chapter describes how to configure Adaptive Server Anywhere to
access remote data.
Contents Topic: page
Introduction 592
591
Introduction
Adaptive Server Anywhere remote data access give you access to data in
othe data sources. You can use this feature to migrate data into an Adaptive
Server Anywhere database. You can also use the feature to query data across
databases, although performance for such multi-database queries is much
slower than when all the data is in a single Adaptive Server Anywhere
database.
With remote data access you can:
♦ Use Adaptive Server Anywhere to move data from one location to
another using insert-select.
592
Chapter 18. Accessing Remote Data
593
Basic concepts to access remote data
This section describes the basic concepts required to access remote data.
Server classes
A server class is assigned to each remote server. The server class specifies
the access method used to interact with the server. Different types of remote
servers require different access methods. The server classes provide
Adaptive Server Anywhere detailed server capability information. Adaptive
Server Anywhere adjusts its interaction with the remote server based on
those capabilities.
594
Chapter 18. Accessing Remote Data
There are currently two groups of server classes. The first is JDBC-based;
the second is ODBC-based.
The JDBC-based server classes are:
♦ asajdbc for Adaptive Server Anywhere (version 6 and later)
♦ asejdbc for Adaptive Server Enterprise and SQL Server (version 10
and later)
The ODBC-based server classes are:
595
Working with remote servers
Before you can map remote objects to a local proxy table, you must define
the remote server where the remote object is located. When you define a
remote server, an entry is added to the SYSSERVERS table for the remote
server. This section describes how to create, alter, and delete a remote server
definition.
where:
♦ ASEserver is the name of the remote server
♦ ASEJDBC is a keyword indicating that the server is Adaptive Server
Enterprise and the connection to it is JDBC-based
♦ rimu:6666 is the machine name and the TCP/IP port number where the
remote server is located
Example 2 The following statement creates an entry in the SYSSERVERS table for the
ODBC-based Adaptive Server Anywhere server named testasa:
CREATE SERVER testasa
CLASS ’ASAODBC’
USING ’test4’
where:
596
Chapter 18. Accessing Remote Data
♦ testasa is the name by which the remote server is known within this
database.
where:
♦ remasa is the name by which the remote server is known within this
database.
♦ ASAODBC is a keyword indicating that the server is Adaptive Server
Anywhere and the connection to it uses ODBC.
♦ USING is the reference to the ODBC driver manager.
Example 4 On Unix platforms the following statement creates an entry in the sysservers
table for the ODBC-based Adaptive Server Enterprise server named remase:
CREATE SERVER remase
CLASS ’aseodbc’
USING ’driver=/opt/sybase/SYBSsa9/drivers/lib/libodbc.so;dsn=my_
ase_dsn’
where:
♦ remase is the name by which the remote server is known within this
database
♦ ASEODBC is a keyword indicating that the server is Adaptive Server
Enterprise and the connection to it uses ODBC
♦ USING is thereference to the ODBC driver manager.
You can create a remote server using a wizard in Sybase Central. For more
information, see “Creating remote servers” on page 596.
597
❖ To create a remote server (Sybase Central)
1. Connect to the host database from Sybase Central.
2. Open the Remote Servers folder for that database.
598
Chapter 18. Accessing Remote Data
3. Right-click the remote server you want to delete and choose Delete from
the popup menu.
3. Right-click the remote server and choose Properties from the popup
menu.
599
Example The following statement changes the server class of the server named
ASEserver to aseodbc. In this example, the Data Source Name for the server
is ASEserver.
ALTER SERVER ASEserver
CLASS ’aseodbc’
600
Chapter 18. Accessing Remote Data
601
Working with external logins
By default, Adaptive Server Anywhere uses the names and passwords of its
clients whenever it connects to a remote server on behalf of those clients.
However, this default can be overridden by creating external logins. External
logins are alternate login names and passwords to be used when
communicating with a remote server.
☞ For more information, see “Using integrated logins” [ASA Database
Administration Guide, page 84].
602
Chapter 18. Accessing Remote Data
3. In the left pane, select the remote server and then click the External
Logins tab in the right pane.
4. Right-click the external login and choose Delete from the popup menu.
Example The following statement drops the external login for the local user fred
created in the example above:
DROP EXTERNLOGIN fred TO ASEserver
☞ See also
♦ “DROP EXTERNLOGIN statement” [ASA SQL Reference, page 436]
603
Working with proxy tables
Location transparency of remote data is enabled by creating a local proxy
table that maps to the remote object. Use one of the following statements to
create a proxy table:
♦ If the table already exists at the remote storage location, use the CREATE
EXISTING TABLE statement. This statement defines the proxy table for
an existing table on the remote server.
♦ If the table does not exist at the remote storage location, use the CREATE
TABLE statement. This statement creates a new table on the remote
server, and also defines the proxy table for that table.
♦ Server This is the name by which the server is known in the current
database, as specified in the CREATE SERVER statement. This field is
mandatory for all remote data sources.
♦ Database The meaning of the database field depends on the data
source. In some cases this field does not apply and should be left empty.
The periods are still required, however.
In Adaptive Server Enterprise, database specifies the database where the
table exists. For example master or pubs2.
In Adaptive Server Anywhere, this field does not apply; leave it empty.
In Excel, Lotus Notes, and Access, you must include the name of the file
containing the table. If the file name includes a period, use the semicolon
delimiter.
♦ Owner If the database supports the concept of ownership, this field
represents the owner name. This field is only required when several
owners have tables with the same name.
♦ Table-name This specifies the name of the table. In the case of an Excel
spreadsheet, this is the name of the “sheet” in the workbook. If the table
604
Chapter 18. Accessing Remote Data
name is left empty, the remote table name is assumed to be the same as
the local proxy table name.
♦ Excel:
’excel;d:\pcdb\quarter3.xls;;sheet1$’
♦ Access:
’access;\\server1\production\inventory.mdb;;parts’
605
Tip
Proxy tables are displayed in the right pane on the Proxy Tables tab when
their remote server is selected in the left pane. They also appear in the
Tables folder. They are distinguished from other tables by a letter P on
their icon.
p_employee employee
proxy table table
mapping
asademo1
local server
server
Example 2 The following statement maps the proxy table a1 to the Microsoft Access
file mydbfile.mdb. In this example, the AT keyword uses the semicolon (;) as
a delimiter. The server defined for Microsoft Access is named access.
606
Chapter 18. Accessing Remote Data
Example The following statement creates a table named employee on the remote
server asademo1, and creates a proxy table named members that maps to the
remote location:
CREATE TABLE members
( membership_id INTEGER NOT NULL,
member_name CHAR(30) NOT NULL,
office_held CHAR( 20 ) NULL)
AT ’asademo1..DBA.employee’
607
sp_remote_columns servername, tablename [, owner ]
[, database]
608
Chapter 18. Accessing Remote Data
emp_fname emp_lname
dept_id dept_name
asademo1
testasa server
server
In real-world cases, you may use joins between tables on different Adaptive
Server Anywhere databases. Here we describe a simple case using just one
database to illustrate the principles.
3. Connect to empty.db from Interactive SQL using the user ID DBA and
the password SQL.
4. In the new database, create a remote server named testasa. Its server class
is asaodbc, and the connection information is ’ASA 9.0 Sample’:
CREATE SERVER testasa
CLASS ’asaodbc’
USING ’ASA 9.0 Sample’
5. In this example, we use the same user ID and password on the remote
database as on the local database, so no external logins are needed.
609
6. Define the employee proxy table:
CREATE EXISTING TABLE employee
AT ’testasa..DBA.employee’
8. Use the proxy tables in the SELECT statement to perform the join.
SELECT emp_fname, emp_lname, dept_name
FROM employee JOIN department
ON employee.dept_id = department.dept_id
ORDER BY emp_lname
610
Chapter 18. Accessing Remote Data
♦ Connect to one of the databases that you will be performing joins from.
For example, connect to db1.
♦ Perform a CREATE SERVER for each other local database you will be
accessing. This sets up a loopback connection to your Adaptive Server
Anywhere server.
CREATE SERVER local_db2
CLASS ’asaodbc’
USING ’testasa_db2’
CREATE SERVER local_db3
CLASS ’asaodbc’
USING ’testasa_db3’
611
Sending native statements to remote servers
Use the FORWARD TO statement to send one or more statements to the
remote server in its native syntax. This statement can be used in two ways:
♦ To send a statement to a remote server.
Example 1 The following statement verifies connectivity to the server named ASEserver
by selecting the version string:
FORWARD TO ASEserver {SELECT @@version}
Example 2 The following statements show a passthrough session with the server named
ASEserver:
FORWARD TO ASEserver
select * from titles
select * from authors
FORWARD TO
612
Chapter 18. Accessing Remote Data
613
☞ For more information, see “CREATE PROCEDURE statement” [ASA
SQL Reference, page 355].
Example Here is an example with a parameter:
CREATE PROCEDURE remoteuser (IN uname char(30))
AT ’bostonase.master.dbo.sp_helpuser’
call remoteuser(’joe’)
Data types for remote The following data types are allowed for RPC parameters. Other data types
procedures are disallowed:
♦ [ UNSIGNED ] SMALLINT
♦ [ UNSIGNED ] INT
♦ [ UNSIGNED ] BIGINT
♦ TINYINT
♦ REAL
♦ DOUBLE
♦ CHAR
♦ BIT
NUMERIC and DECIMAL data types are allowed for IN parameters, but
not for OUT or INOUT parameters.
614
Chapter 18. Accessing Remote Data
615
Transaction management and remote data
Transactions provide a way to group SQL statements so that they are treated
as a unit—either all work performed by the statements is committed to the
database, or none of it is.
For the most part, transaction management with remote tables is the same as
transaction management for local tables in Adaptive Server Anywhere, but
there are some differences. They are discussed in the following section.
☞ For a general discussion of transactions, see “Using Transactions and
Isolation Levels” on page 103.
616
Chapter 18. Accessing Remote Data
617
Internal operations
This section describes the underlying operations on remote servers
performed by Adaptive Server Anywhere on behalf of client applications.
Query parsing
When a statement is received from a client, it is parsed. An error is raised if
the statement is not a valid Adaptive Server Anywhere SQL statement.
Query normalization
The next step is called query normalization. During this step, referenced
objects are verified and some data type compatibility is checked.
For example, consider the following query:
SELECT *
FROM t1
WHERE c1 = 10
The query normalization stage verifies that table t1 with a column c1 exists
in the system tables. It also verifies that the data type of column c1 is
compatible with the value 10. If the column’s data type is datetime, for
example, this statement is rejected.
Query preprocessing
Query preprocessing prepares the query for optimization. It may change the
representation of a statement so that the SQL statement Adaptive Server
Anywhere generates for passing to a remote server will be syntactically
different from the original statement.
Preprocessing performs view expansion so that a query can operate on tables
referenced by the view. Expressions may be reordered and subqueries may
be transformed to improve processing efficiency. For example, some
subqueries may be converted into joins.
Server capabilities
The previous steps are performed on all queries, both local and remote.
The following steps depend on the type of SQL statement and the
capabilities of the remote servers involved.
Each remote server defined to Adaptive Server Anywhere has a set of
capabilities associated with it. These capabilities are stored in the
618
Chapter 18. Accessing Remote Data
syscapabilities system table. These capabilities are initialized during the first
connection to a remote server. The generic server class odbc relies strictly on
information returned from the ODBC driver to determine these capabilities.
Other server classes such as db2odbc have more detailed knowledge of the
capabilities of a remote server type and use that knowledge to supplement
what is returned from the driver.
Once syscapabilities is initialized for a server, the capability information is
retrieved only from the system table. This allows a user to alter the known
capabilities of a server.
Since a remote server may not support all of the features of a given SQL
statement, Adaptive Server Anywhere must break the statement into simpler
components to the point that the query can be given to the remote server.
SQL features not passed off to a remote server must be evaluated by
Adaptive Server Anywhere itself.
For example, a query may contain an ORDER BY statement. If a remote
server cannot perform ORDER BY, the statement is sent to the remote server
without it and Adaptive Server Anywhere performs the ORDER BY on the
result returned, before returning the result to the user. The result is that the
user can employ the full range of Adaptive Server Anywhere supported SQL
without concern for the features of a particular back end.
619
Partial passthrough of the statement
If a statement contains references to multiple servers, or uses SQL features
not supported by a remote server, the query is decomposed into simpler
parts.
Select SELECT statements are broken down by removing portions that cannot be
passed on and letting Adaptive Server Anywhere perform the feature. For
example, let’s say a remote server can not process the atan2() function in the
following statement:
select a,b,c where atan2(b,10) > 3 and c = 10
Each time a row is found, Adaptive Server Anywhere would calculate the
620
Chapter 18. Accessing Remote Data
If a already has a value that equals the “new value”, a positioned UPDATE
would not be necessary and would not be sent remotely.
In order to process an UPDATE or DELETE that requires a table scan, the
remote data source must support the ability to perform a positioned
UPDATE or DELETE (“where current of cursor”). Some data sources do
not support this capability.
621
Troubleshooting remote data access
This section provides some hints for troubleshooting remote servers.
Case sensitivity
The case sensitivity setting of your Adaptive Server Anywhere database
should match the settings used by any remote servers accessed.
Adaptive Server Anywhere databases are created case insensitive by default.
With this configuration, unpredictable results may occur when selecting
from a case sensitive database. Different results will occur depending on
whether ORDER BY or string comparisons are pushed off to a remote server
or evaluated by the local Adaptive Server Anywhere.
Connectivity problems
Take the following steps to be sure you can connect to a remote server:
♦ Determine that you can connect to a remote server using a client tool
such as Interactive SQL before configuring Adaptive Server Anywhere.
622
Chapter 18. Accessing Remote Data
♦ Turn on remote tracing for a trace of the interactions with remote servers.
SET OPTION CIS_OPTION = 7
623
CHAPTER 19
About this chapter This chapter describes how Adaptive Server Anywhere interfaces with
different classes of servers. It describes
♦ Types of servers that each server class supports
♦ The USING clause value for the CREATE SERVER statement for each
server class
♦ Special configuration requirements
Overview 626
625
Overview
The server class you specify in the CREATE SERVER statement determines
the behavior of a remote connection. The server classes give Adaptive
Server Anywhere detailed server capability information. Adaptive Server
Anywhere formats SQL statements specific to a server’s capabilities.
There are two categories of server classes:
♦ JDBC-based server classes
♦ ODBC-based server classes
Each server class has a set of unique characteristics that database
administrators and programmers need to know to configure the server for
remote data access.
When using this chapter, refer both to the section generic to the server class
category (JDBC-based or ODBC-based), and to the section specific to the
individual server class.
626
Chapter 19. Server Classes for Remote Data Access
♦ The Java virtual machine needs more than the default amount of memory
to load and run jConnect. Set these memory options to at least the
following values:
SET OPTION PUBLIC.JAVA_NAMESPACE_SIZE = 3000000
SET OPTION PUBLIC.JAVA_HEAP_SIZE = 1000000
♦ Any remote server that you access using the asejdbc or asajdbc server
class must be set up to handle a jConnect 4.x based client. The jConnect
setup scripts are SQL_anywhere.SQL for Adaptive Server Anywhere or
SQL_server.SQL for Adaptive Server Enterprise. Run these against any
remote server you will be using.
627
USING parameter value in the CREATE SERVER statement
You must perform a separate CREATE SERVER for each Adaptive Server
Anywhere database you intend to access. For example, if an Adaptive Server
Anywhere server named testasa is running on the machine ‘banana’ and
owns three databases (db1, db2, db3), you would configure the local
Adaptive Server Anywhere similar to this:
CREATE SERVER testasadb1
CLASS ’asajdbc’
USING ’banana:2638/db1’
CREATE SERVER testasadb2
CLASS ’asajdbc’
USING ’banana:2638/db2’
CREATE SERVER testasadb2
CLASS ’asajdbc’
USING ’banana:2638/db3’
If you do not specify a /databasename value, the remote connection uses the
remote Adaptive Server Anywhere default database.
☞ For more information about the CREATE SERVER statement, see
“CREATE SERVER statement” [ASA SQL Reference, page 372].
bit bit
tinyint tinyint
smallint smallint
int int
integer integer
628
Chapter 19. Server Classes for Remote Data Access
float real
real real
double float
smallmoney numeric(10,4)
money numeric(19,4)
date datetime
time datetime
timestamp datetime
smalldatetime datetime
datetime datetime
char(n) varchar(n)
character(n) varchar(n)
varchar(n) varchar(n)
text text
binary(n) binary(n)
image image
bigint numeric(20,0)
629
ODBC-based server classes
The ODBC-based server classes include:
♦ asaodbc
♦ aseodbc
♦ db2odbc
♦ mssodbc
♦ oraodbc
♦ odbc
630
Chapter 19. Server Classes for Remote Data Access
631
• Under the Performance tab:
Set Prepare Method to “2-Full.”
Set Fetch Array Size as large as possible for best performance. This
increases memory requirements since this is the number of rows that
must be cached in memory. Sybase recommends using a value of 100.
Set Select Method to “0-Cursor.”
Set Packet Size to as large as possible. Sybase recommends using a
value of -1.
Set Connection Cache to 1.
Bit bit
Tinyint tinyint
Smallint smallint
Int int
Integer integer
Float real
Real real
Double float
Smallmoney numeric(10,4)
Money numeric(19,4)
Date datetime
Time datetime
632
Chapter 19. Server Classes for Remote Data Access
Timestamp datetime
Smalldatetime datetime
Datetime datetime
char(n) varchar(n)
Character(n) varchar(n)
varchar(n) varchar(n)
Text text
binary(n) binary(n)
Image image
Bigint numeric(20,0)
633
Adaptive Server Anywhere data DB2 default data type
type
Bit smallint
Tinyint smallint
Smallint smallint
Int int
Integer int
Bigint decimal(20,0)
char(1–254) varchar(n)
char(255–4000) varchar(n)
Character(255–4000) varchar(n)
varchar(1–4000) varchar(n)
real real
634
Chapter 19. Server Classes for Remote Data Access
float float
double float
smallmoney decimal(10,4)
money decimal(19,4)
date date
time time
smalldatetime timestamp
datetime timestamp
timestamp timestamp
bit number(1,0)
635
Adaptive Server Any- Oracle data type
where data type
tinyint number(3,0)
smallint number(5,0)
int number(11,0)
bigint number(20,0)
float float
real real
smallmoney numeric(13,4)
money number(19,4)
date date
time date
timestamp date
smalldatetime date
datetime date
636
Chapter 19. Server Classes for Remote Data Access
bit bit
tinyint tinyint
smallint smallint
int int
bigint numeric(20,0)
real real
smallmoney smallmoney
money money
date datetime
time datetime
timestamp datetime
smalldatetime datetime
datetime datetime
character(n) char(n)
637
Adaptive Server Anywhere data Microsoft SQL Server default data
type type
double float
uniqueidentifierstr uniqueidentifier
638
Chapter 19. Server Classes for Remote Data Access
specify a default workbook name associated with that data source. However,
when you issue a CREATE TABLE statement, you can override the default
and specify a workbook name in the location string. This allows you to use a
single ODBC DSN to access all of your excel workbooks.
In this example, an ODBC data source named excel was created. To create a
workbook named work1.xls with a sheet (table) called mywork:
CREATE TABLE mywork (a int, b char(20))
AT ’excel;d:\work1.xls;;mywork’
You can import existing worksheets into Adaptive Server Anywhere using
CREATE EXISTING, under the assumption that the first row of your
spreadsheet contains column names.
CREATE EXISTING TABLE mywork
AT’excel;d:\work1;;mywork’
If Adaptive Server Anywhere reports that the table is not found, you may
need to explicitly state the column and row range you wish to map to. For
example:
CREATE EXISTING TABLE mywork
AT ’excel;d:\work1;;mywork$’
Adding the $ to the sheet name indicates that the entire worksheet should be
selected.
Note in the location string specified by AT that a semicolon is used instead
of a period for field separators. This is because periods occur in the file
names. Excel does not support the owner name field so leave this blank.
Deletes are not supported. Also some updates may not be possible since the
Excel driver does not support positioned updates.
Access databases are stored in a .mdb file. Using the ODBC manager, create
an ODBC data source and map it to one of these files. A new .mdb file can
be created through the ODBC manager. This database file becomes the
default if you don’t specify a different default when you create a table
through Adaptive Server Anywhere.
Assuming an ODBC data source named access.
639
CREATE TABLE tab1 (a int, b char(10))
AT ’access...tab1’
or
CREATE TABLE tab1 (a int, b char(10))
AT ’access;d:\pcdb\data.mdb;;tab1’
or
CREATE EXISTING TABLE tab1
AT ’access;d:\pcdb\data.mdb;;tab1’
Access does not support the owner name qualification; leave it empty.
You can store FoxPro tables together inside a single FoxPro database file
(.dbc), or, you can store each table in its own separate .dbf file. When using
.dbf files, be sure the file name is filled into the location string; otherwise the
directory that Adaptive Server Anywhere was started in is used.
CREATE TABLE fox1 (a int, b char(20))
AT ’foxpro;d:\pcdb;;fox1’
This statement creates a file named d:\pcdb\fox1.dbf when you choose the
“free table directory” option in the ODBC Driver Manager.
You can obtain this driver (version 2.04.0203) from the Lotus Web site.
Read the documentation that comes with it for an explanation of how Notes
data maps to relational tables. You can easily map Adaptive Server
Anywhere tables to Notes forms.
Here is how to set up Adaptive Server Anywhere to access the Address
sample file.
♦ Create an ODBC data source using the NotesSQL driver. The database
will be the sample names file: c:\notes\data\names.nsf. The Map Special
Characters option should be turned on. For this example, the Data Source
Name is my_notes_dsn.
640
Chapter 19. Server Classes for Remote Data Access
Avoiding password Lotus Notes does not support sending a user name and password through the
prompts ODBC API. If you try to access Lotus notes using a password protected ID,
a window appears on the machine where Adaptive Server Anywhere is
running, and prompts you for a password. Avoid this behavior in multi-user
server environments.
To access Lotus Notes unattended, without ever receiving a password
prompt, you must use a non-password-protected ID. You can remove
password protection from your ID by clearing it (File ➤ Tools ➤ User ID ➤
Clear Password), unless your Domino administrator required a password
when your ID was created. In this case, you will not be able to clear it.
641
642
PART VI
S TORED P ROCEDURES
AND T RIGGERS
This part describes how to build logic into your database using SQL stored
procedures and triggers. Storing logic in the database makes it available
automatically to all applications, providing consistency, performance, and
security benefits. The Stored Procedure debugger is a powerful tool for
debugging all kinds of logic.
CHAPTER 20
About this chapter Procedures and triggers store procedural SQL statements in the database for
use by all applications. They enhance the security, efficiency, and
standardization of databases. User-defined functions are one kind of
procedures that return a value to the calling environment for use in queries
and other SQL statements. Batches are sets of SQL statements submitted to
the database server as a group. Many features available in procedures and
triggers, such as control statements, are also available in batches.
☞ For many purposes, server-side JDBC provides a more flexible way to
build logic into the database than SQL stored procedures. For information
about JDBC, see “JDBC Programming” [ASA Programming Guide, page 103].
Contents Topic: page
645
Topic: page
646
Chapter 20. Using Procedures, Triggers, and Batches
647
Benefits of procedures and triggers
Definitions for procedures and triggers appear in the database, separately
from any one database application. This separation provides a number of
advantages.
Standardization Procedures and triggers standardize actions performed by more than one
application program. By coding the action once and storing it in the database
for future use, applications need only call the procedure or fire the trigger to
achieve the desired result repeatedly. And since changes occur in only one
place, all applications using the action automatically acquire the new
functionality if the implementation of the action changes.
Efficiency Procedures and triggers used in a network database server environment can
access data in the database without requiring network communication. This
means they execute faster and with less impact on network performance than
if they had been implemented in an application on one of the client
machines.
When you create a procedure or trigger, it is automatically checked for
correct syntax, and then stored in the system tables. The first time any
application calls or fires a procedure or trigger, it is compiled from the
system tables into the server’s virtual memory and executed from there.
Since one copy of the procedure or trigger remains in memory after the first
execution, repeated executions of the same procedure or trigger happen
instantly. As well, several applications can use a procedure or trigger
concurrently, or one application can use it recursively.
Procedures are less efficient if they contain simple queries and have many
arguments. For complex queries, procedures are more efficient.
Security Procedures and triggers provide security by allowing users limited access to
data in tables that they cannot directly examine or modify.
Triggers, for example, execute under the table permissions of the owner of
the associated table, but any user with permissions to insert, update or delete
rows in the table can fire them. Similarly, procedures (including user-defined
functions) execute with permissions of the procedure owner, but any user
granted permissions can call them. This means that procedures and triggers
can (and usually do) have different permissions than the user ID that invoked
them.
648
Chapter 20. Using Procedures, Triggers, and Batches
Introduction to procedures
To use procedures, you need to understand how to:
♦ Create procedures
♦ Call procedures from a database application
Creating procedures
Adaptive Server Anywhere provides a number of tools that let you create a
new procedure.
In Sybase Central, you can use a wizard to provide necessary information.
The Procedure Creation wizard also provides the option of using procedure
templates.
Alternatively, you use the CREATE PROCEDURE statement to create
procedures. However, you must have RESOURCE authority. Where you
enter the statement depends on which tool you use.
649
❖ To create a new remote procedure (Sybase Central)
1. Connect to a database as a user with DBA authority.
Example The following simple example creates the procedure new_dept, which
carries out an INSERT into the department table of the sample database,
creating a new department.
CREATE PROCEDURE new_dept (
IN id INT,
IN name CHAR(35),
IN head_id INT )
BEGIN
INSERT
INTO DBA.department ( dept_id,
dept_name, dept_head_id )
VALUES ( id, name, head_id );
END
650
Chapter 20. Using Procedures, Triggers, and Batches
Altering procedures
You can modify an existing procedure using either Sybase Central or
Interactive SQL. You must have DBA authority or be the owner of the
procedure.
In Sybase Central, you cannot rename an existing procedure directly.
Instead, you must create a new procedure with the new name, copy the
previous code to it, and then delete the old procedure.
Alternatively, you can use an ALTER PROCEDURE statement to modify an
existing procedure. You must include the entire new procedure in this
statement (in the same syntax as in the CREATE PROCEDURE statement
that created the procedure). You must also reassign user permissions on the
procedure.
☞ For more information on altering database object properties, see
“Setting properties for database objects” on page 36.
☞ For more information on granting or revoking permissions for
procedures, see “Granting permissions on procedures” [ASA Database
Administration Guide, page 408] and “Revoking user permissions” [ASA
Database Administration Guide, page 411].
Tip
If you wish to copy code between procedures, you can open a separate
window for each procedure.
651
❖ To alter the code of a procedure (SQL)
1. Connect to the database.
Calling procedures
CALL statements invoke procedures. Procedures can be called by an
application program, or by other procedures and triggers.
☞ For more information, see “CALL statement” [ASA SQL Reference,
page 303].
After this call, you may wish to check the department table to see that the
new department has been added.
All users who have been granted EXECUTE permissions for the procedure
can call the new_dept procedure, even if they have no permissions on the
department table.
☞ For more information about EXECUTE permissions, see “EXECUTE
statement [ESQL]” [ASA SQL Reference, page 449].
Another way of calling a procedure that returns a result set is to call it in a
query. You can execute queries on result sets of procedures and apply
WHERE clauses and other SELECT features to limit the result set.
SELECT t.id, t.quantity_ordered AS q
FROM sp_customer_products( 149 ) t
☞ For more information, see “FROM clause” [ASA SQL Reference, page 469].
Copying procedures in Sybase Central
In Sybase Central, you can copy procedures between databases. To do so,
select the procedure in the left pane of Sybase Central and drag it to the
Procedures & Functions folder of another connected database. A new
procedure is then created, and the original procedure’s code is copied to it.
652
Chapter 20. Using Procedures, Triggers, and Batches
Note that only the procedure code is copied to the new procedure. The other
procedure properties (permissions, etc.) are not copied. A procedure can be
copied to the same database, provided it is given a new name.
Deleting procedures
Once you create a procedure, it remains in the database until someone
explicitly removes it. Only the owner of the procedure or a user with DBA
authority can drop the procedure from the database.
653
CREATE PROCEDURE AverageSalary( OUT avgsal
NUMERIC (20,3) )
BEGIN
SELECT AVG( salary )
INTO avgsal
FROM employee;
END
4. Call the procedure using the created variable to hold the result:
CALL AverageSalary(Average)
If the procedure was created and run properly, the Interactive SQL
Messages pane does not display any errors.
5. Execute the SELECT Average statement to inspect the value of the
variable.
Look at the value of the output variable Average. The Results tab in the
Results pane displays the value 49988.623 for this variable, the average
employee salary.
654
Chapter 20. Using Procedures, Triggers, and Batches
If Interactive SQL calls this procedure, the names in the RESULT clause are
matched to the results of the query and used as column headings in the
displayed results.
To test this procedure from Interactive SQL, you can CALL it, specifying
one of the departments of the company. In Interactive SQL, the results
appear on the Results tab in the Results pane.
Example To list the salaries of employees in the R & D department (department ID
100), type the following:
CALL SalaryList (100)
Employee ID Salary
102 45700
105 62000
160 57490
243 72995
... ...
Interactive SQL can only return multiple result sets if you have this option
enabled on the Results tab of the Options dialog. Each result set appears on
a separate tab in the Results pane.
☞ For more information, see “Returning multiple result sets from
procedures” on page 679.
655
Introduction to user-defined functions
User-defined functions are a class of procedures that return a single value to
the calling environment. Adaptive Server Anywhere treats all user-defined
functions as idempotent unless they are declared NOT DETERMINISTIC.
Idempotent functions return a consistent result for the same parameters and
are free of side effects. Two successive calls to an idempotent function with
the same parameters return the same result, and have no unwanted
side-effects on the query’s semantics.
☞ For more information about non-deterministic and deterministic
functions, see “Function caching” on page 449.
This section introduces creating, using, and dropping user-defined functions.
Note
If you are using a tool other than Interactive SQL or Sybase Central, you
may need to change the command delimiter to something other than the
semicolon.
656
Chapter 20. Using Procedures, Triggers, and Batches
Fran Whitney
Matthew Cobb
Philip Chin
...
The following statement in Interactive SQL returns a full name from a
supplied first and last name:
SELECT fullname (’Jane’, ’Smith’);
fullname (‘Jane’,’Smith’)
Jane Smith
Any user who has been granted EXECUTE permissions for the function can
use the fullname function.
Example The following user-defined function illustrates local declarations of
variables.
The customer table includes some Canadian customers sprinkled among
those from the USA, but there is no country column. The user-defined
657
function nationality uses the fact that the US zip code is numeric while the
Canadian postal code begins with a letter to distinguish Canadian and US
customers.
CREATE FUNCTION nationality( cust_id INT )
RETURNS CHAR( 20 )
BEGIN
DECLARE natl CHAR(20);
IF cust_id IN ( SELECT id FROM customer
WHERE LEFT(zip,1) > ’9’) THEN
SET natl = ’CDN’;
ELSE
SET natl = ’USA’;
END IF;
RETURN ( natl );
END
This example declares a variable natl to hold the nationality string, uses a
SET statement to set a value for the variable, and returns the value of the natl
string to the calling environment.
The following query lists all Canadian customers in the customer table:
SELECT *
FROM customer
WHERE nationality(id) = ’CDN’
Notes While this function is useful for illustration, it may perform very poorly if
used in a SELECT involving many rows. For example, if you used the
SELECT query on a table containing 100 000 rows, of which 10 000 are
returned, the function will be called 10 000 times. If you use it in the
WHERE clause of the same query, it would be called 100 000 times.
658
Chapter 20. Using Procedures, Triggers, and Batches
659
Introduction to triggers
A trigger is a special form of stored procedure that is executed automatically
when a query that modifies data is executed. You use triggers whenever
referential integrity and other declarative constraints are insufficient.
☞ For more information on referential integrity, see “Ensuring Data
Integrity” on page 79 and “CREATE TABLE statement” [ASA SQL Reference,
page 385].
Action Description
660
Chapter 20. Using Procedures, Triggers, and Batches
Creating triggers
You create triggers using either Sybase Central or Interactive SQL. In
Sybase Central, you can use a wizard to provide necessary information. In
Interactive SQL, you can use a CREATE TRIGGER statement. For both
tools, you must have DBA or RESOURCE authority to create a trigger and
you must have ALTER permissions on the table associated with the trigger.
The body of a trigger consists of a compound statement: a set of
semicolon-delimited SQL statements bracketed by a BEGIN and an END
statement.
You cannot use COMMIT and ROLLBACK and some ROLLBACK TO
SAVEPOINT statements within a trigger.
☞ For more information, see the list of cross-references at the end of this
section.
661
❖ To create a new trigger for a given table (SQL)
1. Connect to a database.
2. Execute a CREATE TRIGGER statement.
Example 1: A row-level The following trigger is an example of a row-level INSERT trigger. It checks
INSERT trigger that the birth date entered for a new employee is reasonable:
CREATE TRIGGER check_birth_date
AFTER INSERT ON Employee
REFERENCING NEW AS new_employee
FOR EACH ROW
BEGIN
DECLARE err_user_error EXCEPTION
FOR SQLSTATE ’99999’;
IF new_employee.birth_date > ’June 6, 2001’ THEN
SIGNAL err_user_error;
END IF;
END
This trigger fires after any row is inserted into the employee table. It detects
and disallows any new rows that correspond to birth dates later than June 6,
2001.
The phrase REFERENCING NEW AS new_employee allows statements in
the trigger code to refer to the data in the new row using the alias
new_employee.
Signaling an error causes the triggering statement, as well as any previous
effects of the trigger, to be undone.
For an INSERT statement that adds many rows to the employee table, the
check_birth_date trigger fires once for each new row. If the trigger fails for
any of the rows, all effects of the INSERT statement roll back.
You can specify that the trigger fires before the row is inserted rather than
after by changing the first line of the example to:
CREATE TRIGGER mytrigger BEFORE INSERT ON Employee
The REFERENCING NEW clause refers to the inserted values of the row; it
is independent of the timing (BEFORE or AFTER) of the trigger.
You may find it easier in some cases to enforce constraints using declaration
referential integrity or CHECK constraints, rather than triggers. For
example, implementing the above example with a column check constraint
proves more efficient and concise:
CHECK (@col <= ’June 6, 2001’)
Example 2: A row-level The following CREATE TRIGGER statement defines a row-level DELETE
DELETE trigger example
662
Chapter 20. Using Procedures, Triggers, and Batches
trigger:
CREATE TRIGGER mytrigger BEFORE DELETE ON employee
REFERENCING OLD AS oldtable
FOR EACH ROW
BEGIN
...
END
The REFERENCING OLD clause enables the delete trigger code to refer to
the values in the row being deleted using the alias oldtable.
You can specify that the trigger fires after the row is deleted rather than
before, by changing the first line of the example to:
CREATE TRIGGER check_birth_date AFTER DELETE ON employee
Executing triggers
Triggers execute automatically whenever an INSERT, UPDATE, or
DELETE operation is performed on the table named in the trigger. A
row-level trigger fires once for each row affected, while a statement-level
663
trigger fires once for the entire statement.
When an INSERT, UPDATE, or DELETE fires a trigger, the order of
operation is as follows:
1. BEFORE triggers fire.
2. Referential actions are performed.
3. The operation itself is performed.
4. AFTER triggers fire.
If any of the steps encounter an error not handled within a procedure or
trigger, the preceding steps are undone, the subsequent steps are not
performed, and the operation that fired the trigger fails.
Altering triggers
You can modify an existing trigger using either Sybase Central or
Interactive SQL. You must be the owner of the table on which the trigger is
defined, or be DBA, or have ALTER permissions on the table and have
RESOURCE authority.
In Sybase Central, you cannot rename an existing trigger directly. Instead,
you must create a new trigger with the new name, copy the previous code to
it, and then delete the old trigger.
Alternatively, you can use an ALTER TRIGGER statement to modify an
existing trigger. You must include the entire new trigger in this statement (in
the same syntax as in the CREATE TRIGGER statement that created the
trigger).
☞ For more information on altering database object properties, see
“Setting properties for database objects” on page 36.
2. Select the desired trigger. You can then do one of the following:
♦ Edit the code directly on the SQL tab in the right pane.
♦ Translate the code to Watcom-SQL or Transact-SQL prior to editing it:
• Right-click the desired trigger and choose Open as Watcom-SQL or
Open as Transact-SQL from the popup menu.
• Edit the code on the SQL tab in the right pane.
664
Chapter 20. Using Procedures, Triggers, and Batches
Tip
If you wish to copy code between triggers, you can open a separate
window for each trigger.
Dropping triggers
Once you create a trigger, it remains in the database until someone explicitly
removes it. You must have ALTER permissions on the table associated with
the trigger to drop the trigger.
665
Triggers execute using the permissions of the owner of the table on which
they are defined, not the permissions of the user who caused the trigger to
fire, and not the permissions of the user who created the trigger.
When a trigger refers to a table, it uses the group memberships of the table
creator to locate tables with no explicit owner name specified. For example,
if a trigger on user_1.Table_A references Table_B and does not specify the
owner of Table_B, then either Table_B must have been created by user_1 or
user_1 must be a member of a group (directly or indirectly) that is the owner
of Table_B. If neither condition is met, a table not found message results
when the trigger fires.
Also, user_1 must have permissions to carry out the operations specified in
the trigger.
666
Chapter 20. Using Procedures, Triggers, and Batches
Introduction to batches
A simple batch consists of a set of SQL statements, separated by semicolons
or separated by a separate line with just the word go on it. The use of go is
recommended. For example, the following set of statements form a batch,
which creates an Eastern Sales department and transfers all sales reps from
Massachusetts to that department.
INSERT
INTO department ( dept_id, dept_name )
VALUES ( 220, ’Eastern Sales’ )
go
UPDATE employee
SET dept_id = 220
WHERE dept_id = 200
AND state = ’MA’
go
COMMIT
go
You can include this set of statements in an application and execute them
together.
667
IF NOT EXISTS (
SELECT * FROM SYSTABLE
WHERE table_name = ’t1’ ) THEN
CREATE TABLE t1 (
firstcol INT PRIMARY KEY,
secondcol CHAR( 30 )
)
go
ELSE
MESSAGE ’Table t1 already exists’ TO CLIENT;
END IF
If you run this batch twice from Interactive SQL, it creates the table the first
time you run it and displays the message in the Interactive SQL Messages
pane the next time you run it.
668
Chapter 20. Using Procedures, Triggers, and Batches
Control statements
There are a number of control statements for logical flow and decision
making in the body of the procedure or trigger, or in a batch. Available
control statements include:
669
Control statement Syntax
For more information about each statement, see the entries in “SQL
Statements” [ASA SQL Reference, page 241]
670
Chapter 20. Using Procedures, Triggers, and Batches
encounter an error after updating many rows. If the statement does not
complete, all changes revert back to their original state. The UPDATE
statement is atomic.
All non-compound SQL statements are atomic. You can make a compound
statement atomic by adding the keyword ATOMIC after the BEGIN
keyword.
BEGIN ATOMIC
UPDATE employee
SET manager_ID = 501
WHERE emp_ID = 467;
UPDATE employee
SET birth_date = ’bad_data’;
END
In this example, the two update statements are part of an atomic compound
statement. They must either succeed or fail as one. The first update statement
would succeed. The second one causes a data conversion error since the
value being assigned to the birth_date column cannot be converted to a date.
The atomic compound statement fails and the effect of both UPDATE
statements is undone. Even if the currently executing transaction is
eventually committed, neither statement in the atomic compound statement
takes effect.
If an atomic compound statement succeeds, the changes made within the
compound statement take effect only if the currently executing transaction is
committed.
You cannot use COMMIT and ROLLBACK and some ROLLBACK TO
SAVEPOINT statements within an atomic compound statement (see
“Transactions and savepoints in procedures and triggers” on page 696).
There is a case where some, but not all, of the statements within an atomic
compound statement are executed. This happens when an exception handler
within the compound statement deals with an error.
☞ For more information, see “Using exception handlers in procedures and
triggers” on page 690.
671
The structure of procedures and triggers
The body of a procedure or trigger consists of a compound statement as
discussed in “Using compound statements” on page 670. A compound
statement consists of a BEGIN and an END, enclosing a set of SQL
statements. Semicolons delimit each statement.
672
Chapter 20. Using Procedures, Triggers, and Batches
The following statement assigns the DEFAULT NULL, and the procedure
RETURNs instead of executing the query.
CALL customerproducts();
673
We assume that the calling environment has set up three variables to hold the
values passed to the procedure:
CREATE VARIABLE V1 INT;
CREATE VARIABLE V2 INT;
CREATE VARIABLE V3 INT;
The procedure SampleProc may be called supplying only the first parameter
as follows:
CALL SampleProc( V1 )
in which case the default values are used for var2 and var3.
A more flexible method of calling procedures with optional arguments is to
pass the parameters by name. The SampleProc procedure may be called as
follows:
CALL SampleProc( var1 = V1, var3 = V3 )
or as follows:
CALL SampleProc( var3 = V3, var1 = V1 )
Name
Fran Whitney
Matthew Cobb
Philip Chin
Julie Jordan
...
Notes
674
Chapter 20. Using Procedures, Triggers, and Batches
675
Returning results from procedures
Procedures can return results in the form of a single row of data, or multiple
rows. Results consisting of a single row of data can be passed back as
arguments to the procedure. Results consisting of multiple rows of data are
passed back as result sets. Procedures can also return a single value given in
the RETURN statement.
For simple examples of how to return results from procedures, see
“Introduction to procedures” on page 649. For more detailed information,
see the following sections.
Using the SET statement The following somewhat artificial procedure returns a value in an OUT
parameter assigned using a SET statement:
676
Chapter 20. Using Procedures, Triggers, and Batches
Using single-row Single-row queries retrieve at most one row from the database. This type of
SELECT statements query uses a SELECT statement with an INTO clause. The INTO clause
follows the select list and precedes the FROM clause. It contains a list of
variables to receive the value for each select list item. There must be the
same number of variables as there are select list items.
When a SELECT statement executes, the server retrieves the results of the
SELECT statement and places the results in the variables. If the query
results contain more than one row, the server returns an error. For queries
returning more than one row, you must use cursors. For information about
returning more than one row from a procedure, see “Returning result sets
from procedures” on page 678.
If the query results in no rows being selected, a row not found warning
appears.
The following procedure returns the results of a single-row SELECT
statement in the procedure parameters.
You can test this procedure in Interactive SQL using the following
statements, which show the number of orders placed by the customer with
ID 102:
CREATE VARIABLE orders INT;
CALL OrderCount ( 102, orders );
SELECT orders;
677
Notes ♦ The customer_ID parameter is declared as an IN parameter. This
parameter holds the customer ID passed in to the procedure.
♦ The Orders parameter is declared as an OUT parameter. It holds the
value of the orders variable that returned to the calling environment.
♦ No DECLARE statement is necessary for the Orders variable, as it is
declared in the procedure argument list.
♦ The SELECT statement returns a single row and places it into the
variable Orders.
Company Value
Chadwicks 8076
678
Chapter 20. Using Procedures, Triggers, and Batches
Notes ♦ The number of variables in the RESULT list must match the number of
the SELECT list items. Automatic data type conversion is carried out
where possible if data types do not match.
♦ The RESULT clause is part of the CREATE PROCEDURE statement,
and does not have a command delimiter.
♦ The names of the SELECT list items do not need to match those of the
RESULT list.
♦ When testing this procedure, Interactive SQL displays only the first result
set by default. You can configure Interactive SQL to display more than
one result set by setting the Show multiple result sets option on the
Results tab of the Options dialog.
♦ You can modify procedure result sets, unless they are generated from a
view. The user calling the procedure requires the appropriate permissions
on the underlying table to modify procedure results. This is different than
the usual permissions for procedure execution, where the procedure
owner must have permissions on the table.
☞ For information about modifying result sets in Interactive SQL, see
“Editing table values in Interactive SQL” [Introducing SQL Anywhere Studio,
page 170].
679
CREATE PROCEDURE ListPeople()
RESULT ( lname CHAR(36), fname CHAR(36) )
BEGIN
SELECT emp_lname, emp_fname
FROM employee;
SELECT lname, fname
FROM customer;
SELECT last_name, first_name
FROM contact;
END
Notes ♦ To test this procedure in Interactive SQL, enter the following statement in
the SQL Statements pane:
CALL ListPeople ()
680
Chapter 20. Using Procedures, Triggers, and Batches
681
Using cursors in procedures and triggers
Cursors retrieve rows one at a time from a query or stored procedure with
multiple rows in its result set. A cursor is a handle or an identifier for the
query or procedure, and for a current position within the result set.
3. Use the FETCH statement to retrieve results one row at a time from the
cursor.
4. The warning Row Not Found signals the end of the result set.
5. Close the cursor using the CLOSE statement.
682
Chapter 20. Using Procedures, Triggers, and Batches
683
♦ The LOOP statement loops over each row of the query, placing each
company name in turn into the variables ThisName and ThisValue. If
ThisValue is greater than the current top value, TopCompany and
TopValue are reset to ThisName and ThisValue.
♦ The cursor closes at the end of the procedure.
♦ You can also write this procedure without a loop by adding an ORDER
BY value DESC clause to the SELECT statement. Then, only the first
row of the cursor needs to be fetched.
The LOOP construct in the TopCompanyValue procedure is a standard form,
exiting after the last row processes. You can rewrite this procedure in a more
compact form using a FOR loop. The FOR statement combines several
aspects of the above procedure into a single statement.
CREATE PROCEDURE TopCustomerValue2(
OUT TopCompany CHAR(36),
OUT TopValue INT )
BEGIN
-- Initialize the TopValue variable
SET TopValue = 0;
-- Do the For Loop
FOR CompanyFor AS ThisCompany
CURSOR FOR
SELECT company_name AS ThisName ,
CAST( sum( sales_order_items.quantity *
product.unit_price ) AS INTEGER )
AS ThisValue
FROM customer
INNER JOIN sales_order
INNER JOIN sales_order_items
INNER JOIN product
GROUP BY ThisName
DO
IF ThisValue > TopValue THEN
SET TopCompany = ThisName;
SET TopValue = ThisValue;
END IF;
END FOR;
END
684
Chapter 20. Using Procedures, Triggers, and Batches
685
RESUME is dictated by the ON_TSQL_ERROR option setting. For
more information, see “ON_TSQL_ERROR option [compatibility]” [ASA
Database Administration Guide, page 630].
Default error handling Generally, if a SQL statement in a procedure or trigger fails, the procedure or
trigger terminates execution and control returns to the application program
with an appropriate setting for the SQLSTATE and SQLCODE values. This
is true even if the error occurred in a procedure or trigger invoked directly or
indirectly from the first one. In the case of a trigger, the operation causing
the trigger is also undone and the error is returned to the application.
The following demonstration procedures show what happens when an
application calls the procedure OuterProc, and OuterProc in turn calls the
procedure InnerProc, which then encounters an error.
CREATE PROCEDURE OuterProc()
BEGIN
MESSAGE ’Hello from OuterProc.’ TO CLIENT;
CALL InnerProc();
MESSAGE ’SQLSTATE set to ’,
SQLSTATE,’ in OuterProc.’ TO CLIENT
END
CREATE PROCEDURE InnerProc()
BEGIN
DECLARE column_not_found
EXCEPTION FOR SQLSTATE ’52003’;
MESSAGE ’Hello from InnerProc.’ TO CLIENT;
SIGNAL column_not_found;
MESSAGE ’SQLSTATE set to ’,
SQLSTATE, ’ in InnerProc.’ TO CLIENT;
END
Notes ♦ The DECLARE statement in InnerProc declares a symbolic name for one
of the predefined SQLSTATE values associated with error conditions
already known to the server.
♦ The MESSAGE statement sends a message to the Interactive SQL
Messages pane.
686
Chapter 20. Using Procedures, Triggers, and Batches
♦ LOOP
♦ LEAVE
♦ CONTINUE
♦ CALL
♦ EXECUTE
♦ SIGNAL
♦ RESIGNAL
687
♦ DECLARE
♦ SET VARIABLE
688
Chapter 20. Using Procedures, Triggers, and Batches
689
The following statement executes the OuterProc procedure:
CALL OuterProc();
690
Chapter 20. Using Procedures, Triggers, and Batches
The EXCEPTION statement declares the exception handler itself. The lines
following the EXCEPTION statement do not execute unless an error occurs.
Each WHEN clause specifies an exception name (declared with a
DECLARE statement) and the statement or statements to be executed in the
event of that exception. The WHEN OTHERS THEN clause specifies the
statement(s) to be executed when the exception that occurred does not
appear in the preceding WHEN clauses.
In this example, the statement RESIGNAL passes the exception on to a
higher-level exception handler. RESIGNAL is the default action if WHEN
OTHERS THEN is not specified in an exception handler.
The following statement executes the OuterProc procedure:
CALL OuterProc();
Notes ♦ The EXCEPTION statements execute, rather than the lines following the
SIGNAL statement in InnerProc.
♦ As the error encountered was a column not found error, the
MESSAGE statement included to handle the error executes, and
SQLSTATE resets to zero (indicating no errors).
♦ After the exception handling code executes, control passes back to
OuterProc, which proceeds as if no error was encountered.
691
♦ You should not use ON EXCEPTION RESUME together with explicit
exception handling. The exception handling code is not executed if ON
EXCEPTION RESUME is included.
♦ If the error handling code for the column not found exception is
simply a RESIGNAL statement, control passes back to the OuterProc
procedure with SQLSTATE still set at the value 52003. This is just as if
there were no error handling code in InnerProc. Since there is no error
handling code in OuterProc, the procedure fails.
Exception handling and When an exception is handled inside a compound statement, the compound
atomic compound statement completes without an active exception and the changes before the
statements exception are not reversed. This is true even for atomic compound
statements. If an error occurs within an atomic compound statement and is
explicitly handled, some but not all of the statements in the atomic
compound statement are executed.
692
Chapter 20. Using Procedures, Triggers, and Batches
When the SIGNAL statement that causes the error is encountered, control
passes to the exception handler for the compound statement, and the Column
not found handling message prints. Control then passes back to the
outer compound statement and the Outer compound statement message
prints.
If an error other than column not found is encountered in the inner
compound statement, the exception handler executes the RESIGNAL
statement. The RESIGNAL statement passes control directly back to the
calling environment, and the remainder of the outer compound statement is
not executed.
693
Using the EXECUTE IMMEDIATE statement in
procedures
The EXECUTE IMMEDIATE statement allows statements to be constructed
inside procedures using a combination of literal strings (in quotes) and
variables.
For example, the following procedure includes an EXECUTE IMMEDIATE
statement that creates a table.
CREATE PROCEDURE CreateTableProc(
IN tablename char(30) )
BEGIN
EXECUTE IMMEDIATE ’CREATE TABLE ’
|| tablename
|| ’(column1 INT PRIMARY KEY)’
END
The EXECUTE IMMEDIATE statement can be used with queries that return
result sets. For example:
CREATE PROCEDURE DynamicResult(
IN Columns LONG VARCHAR,
IN TableName CHAR(128),
IN Restriction LONG VARCHAR DEFAULT NULL)
BEGIN
DECLARE Command LONG VARCHAR;
SET Command = ’SELECT ’ || Columns || ’ FROM ’ || TableName;
IF ISNULL( Restriction,’’) <> ’’ THEN
SET Command = Command || ’ WHERE ’ || Restriction;
END IF;
EXECUTE IMMEDIATE Command;
END
table_id table_name
1 SYSTABLE
2 SYSCOLUMN
3 SYSINDEX
... ...
In ATOMIC compound statements, you cannot use an EXECUTE
694
Chapter 20. Using Procedures, Triggers, and Batches
695
Transactions and savepoints in procedures and
triggers
SQL statements in a procedure or trigger are part of the current transaction
(see “Using Transactions and Isolation Levels” on page 103). You can call
several procedures within one transaction or have several transactions in one
procedure.
COMMIT and ROLLBACK are not allowed within any atomic statement
(see “Atomic compound statements” on page 670). Note that triggers are
fired due to an INSERT, UPDATE, or DELETE which are atomic
statements. COMMIT and ROLLBACK are not allowed in a trigger or in
any procedures called by a trigger.
Savepoints (see “Savepoints within transactions” on page 106) can be used
within a procedure or trigger, but a ROLLBACK TO SAVEPOINT statement
can never refer to a savepoint before the atomic operation started. Also, all
savepoints within an atomic operation are released when the atomic
operation completes.
696
Chapter 20. Using Procedures, Triggers, and Batches
697
Reference, page 469].
698
Chapter 20. Using Procedures, Triggers, and Batches
The alias for the result set is necessary only in the first SELECT statement,
as the server uses the first SELECT statement in the batch to describe the
result set.
A RESUME statement is necessary following each query to retrieve the next
result set.
699
Calling external libraries from procedures
You can call a function in an external library from a stored procedure or
user-defined function. You can call functions in a DLL under Windows
operating systems, in an NLM under NetWare, and in a shared object on
UNIX. You cannot call external functions on Windows CE.
This section describes how to use the external library calls in procedures.
Sample external stored procedures, plus the files required to build a DLL
containing them, are located in the following folder:
<%ASANY%>\Samples\Asa\ExternalProcedures.
Caution
External libraries called from procedures share the memory of the server.
If you call an external library from a procedure and the external library
contains memory-handling errors, you can crash the server or corrupt your
database. Ensure that you thoroughly test your libraries before deploying
them on production databases.
The API described in this section replaces an older API. Libraries written to
the older API, used in versions before version 7.0, are still supported, but in
new development you should use the new API.
Adaptive Server Anywhere includes a set of system procedures that make
use of this capability, for example to send MAPI e-mail messages.
☞ For more information on system procedures, see “System Procedures
and Functions” [ASA SQL Reference, page 747].
If you call an external DLL from a procedure, the procedure cannot carry out
any other tasks; it just forms a wrapper around the DLL.
700
Chapter 20. Using Procedures, Triggers, and Batches
701
whether the NLM is already loaded. If the NLM is not already loaded, you
must provide a library name. The file extension .nlm is optional.
☞ For more information about the CREATE PROCEDURE statement
syntax, see “CREATE PROCEDURE statement” [ASA SQL Reference,
page 355].
The function must return void, and must take as arguments a structure used
702
Chapter 20. Using Procedures, Triggers, and Batches
to pass the arguments, and a handle to the arguments provided by the SQL
procedure.
The an_extfn_api structure has the following form:
typedef struct an_extfn_api {
short (SQL_CALLBACK *get_value)(
void * arg_handle,
a_SQL_uint32 arg_num,
an_extfn_value *value
);
short (SQL_CALLBACK *get_piece)(
void * arg_handle,
a_SQL_uint32 arg_num,
an_extfn_value *value,
a_SQL_uint32 offset
);
short (SQL_CALLBACK *set_value)(
void * arg_handle,
a_SQL_uint32 arg_num,
an_extfn_value *value
short append
);
void (SQL_CALLBACK *set_cancel)(
void * arg_handle,
void * cancel_handle
);
} an_extfn_api;
Notes Calling get_value on an OUT parameter returns the data type of the
argument, and returns data as NULL.
The get_piece function for any given argument can only be called
immediately after the get_value function for the same argument,
To return NULL, set data to NULL in an_extfn_value.
The append field of set_value determines whether the supplied data
replaces (false) or appends to (true) the existing data. You must call
set_value with append=FALSE before calling it with append=TRUE for the
same argument. The append field is ignored for fixed length data types.
703
The header file itself contains some additional notes.
The following table shows the conditions under which the functions defined
in an_extfn_api return false:
If the DLL does not export this function, the database server ignores any
704
Chapter 20. Using Procedures, Triggers, and Batches
You cannot use date or time data types, and you cannot use exact numeric
data types.
To provide values for INOUT or OUT parameters, use the set_value API
function. To read IN and INOUT parameters, use the get_value API
function.
Passing NULL You can pass NULL as a valid value for all arguments. Functions in external
705
libraries can supply NULL as a return type for any data type.
External function return The following table lists the supported return types, and how they map to the
types return type of the SQL function or procedure.
706
Chapter 20. Using Procedures, Triggers, and Batches
begin
for hide_lp as hide_cr cursor for
select proc_name,user_name
from SYS.SYSPROCEDURE p, SYS.SYSUSERPERM u
where p.creator = u.user_id
and p.creator not in (0,1,3)
do
message ’altering ’ || proc_name;
execute immediate ’alter procedure "’ ||
user_name || ’"."’ || proc_name
|| ’" set hidden’
end for
end
707
CHAPTER 21
About this chapter This chapter describes how to use the Sybase debugger to assist in
developing SQL stored procedures, triggers, and event handlers as well as
Java stored procedures.
Contents Topic: page
709
Introduction to debugging in the database
You can use the debugger during the development of the following objects:
♦ SQL stored procedures, triggers, event handlers, and user-defined
functions.
Debugger features
You can carry out many tasks with the debugger, including the following:
♦ Debug procedures and triggers You can debug SQL stored
procedures and triggers.
♦ Debug Java classes You can debug Java classes that are stored in the
database.
710
Chapter 21. Debugging Logic in the Database
711
Tutorial: Getting started with the debugger
This tutorial describes how to start the debugger, how to connect to a
database, how to debug a simple stored procedure, and how to debug a Java
class.
712
Chapter 21. Debugging Logic in the Database
713
Right click the debugger_tutorial procedure and choose Execute from
Interactive SQL from the popup menu.
An Interactive SQL window opens and the following result set is
displayed:
top_company top_value
(NULL) (NULL)
This is clearly an incorrect result. The remainder of the tutorial diagnoses
the error that produced this result.
To diagnose the bug in the procedure, set breakpoints in the procedure and
step through the code, watching the value of variables as the procedure is
executed.
Here, you set a breakpoint at the first executable statement in the procedure.
Click to the left of this line in the vertical gray bar to set a breakpoint.
The breakpoint appears as a red circle.
3. Execute the procedure again.
a. In the left pane, right click the sp_customer_products procedure and
choose Execute from Interactive SQL from the popup menu.
b. A message box appears, asking if you want to debug the connection
from Interactive SQL. Click Yes.
Execution of the procedure stops at the breakpoint. A yellow arrow in
the source code window indicates the current position, which is at the
breakpoint.
714
Chapter 21. Debugging Logic in the Database
4. Inspect variables.
The Local variables window in the Debugger Details pane displays a list
of variables in the procedure together with their current value and data
type. The top_company, top_value, this_value, and this_company
variables are all uninitialized and are therefore NULL.
5. Step through the code.
Press F11 several times to step through the code, until you reach the
following line:
if this_value > top_value then
As you step through the lines of the stored procedure, the value of the
variables changes.
When you are at the if statement, this_value is set to 3000 and top_value
is still NULL.
6. Step into one more statement.
Press F11 once more to see which branch the execution takes. The yellow
arrow moves directly back to the label statement at the beginning of the
loop, which contains the following text:
customer loop: loop
The if test did not return true. The test failed because a comparison of
any value to NULL returns NULL. A value of NULL fails the test and the
code inside the if...end if statement is not executed.
At this point, you may realize that the problem is the fact that top_value
is not initialized.
You can test the hypothesis that the problem is the lack of initialization for
top_value right in the debugger, without changing the procedure code.
715
3. Disable the breakpoint and execute the procedure.
a. Click the breakpoint so that it turns gray (disabled).
b. Press F5 to complete execution of the procedure.
The Interactive SQL window appears again. It shows the correct
results.
top_company top_value
Chadwicks 8076
The hypothesis is confirmed. The problem is that the top_value is not
initialized.
716
Chapter 21. Debugging Logic in the Database
It then loops through all the rows of the result set, and returns the one with
the highest unit price.
Compiling Java classes You must compile classes with the javac -g option in order to debug them.
for debugging The sample classes are already compiled for debugging.
To work through this tutorial, you must enable the sample database to use
Java, and install JDBCExamples.class into the sample database.
For instructions, see “Setting up the Java sample” [ASA Programming Guide,
page 82].
☞ For information about the JDBCExamples class and its methods, see
“JDBC Programming” [ASA Programming Guide, page 103].
The debugger looks in a set of locations for source code files (with .java
extension). You need to add the Samples\ASA\Java subdirectory of your
installation directory to the list of locations, so that the code for the class
currently being executed in the database is available to the debugger.
717
4. Display the source code for the JDBCExamples class:
a. In the Sybase Central left pane, open the Java Objects folder.
b. In the right pane, open the All Java Classes folder and locate the JAR
file or class you wish to debug. Depending on your Sybase Central
settings, you may wish to click the Creator column. This sorts the
listing by creator so that classes owned by DBA appear before those
owned by SYS.
c. Double-click the JDBCExamples class.
d. In the right pane, click the Source tab. The source code for the class is
displayed.
Set a breakpoint
2. Click the gray column on the left of the line until it shows a red circle.
int max_price = 0;
718
Chapter 21. Debugging Logic in the Database
This section illustrates some of the ways you can step through code in the
debugger.
Following the previous section, the debugger should have stopped execution
of JDBCExamples.Query at the first statement in the method:
Examples Here are some example steps you can try:
1. Step to the next line Choose Debug ➤ Step Over, or press F10 to step
to the next line in the current method. Try this two or three times.
2. Run to the cursor Select the following line using the mouse, and
choose Debug ➤ Run To Cursor, or press Ctrl+F10 to run to that line and
break:
max_price = price;
A red stop sign appears in the left-hand column to mark the breakpoint.
Press F5 to execute to that breakpoint.
4. Experiment Try different methods of stepping through the code. End
with F5 to complete the execution.
The complete set of options for stepping through source code is available
from the Debug menu.
When you have completed the execution, the Interactive SQL Results
pane in the Results tab displays the value 24.
In this lesson you inspect the values of both local variables (declared in a
method) and class static variables in the debugger.
Inspecting local variables You can inspect the values of local variables in a method as you step through
the code, to better understand what is happening. You must have compiled
the class with the javac -g option to do this.
719
❖ To inspect and modify the value of a variable
1. Set a breakpoint at the first line of the JDBCExamples.Query method.
This line is as follows:
int max_price = 0
5. Press F10 repeatedly to step through the code. As you do so, the values of
the variables appear in the Local tab list. Step through until the stmt and
result variables have values.
6. Expand the result object by clicking the icon next to it, or setting the
cursor on the line and pressing Enter. This displays the values of the
fields in the object.
Inspecting static In addition to local variables, you can display class-level variables (static
variables variables) in the debugger Statics tab, and watch their values in the Watch
tab. For more information, see the debugger online Help.
720
Chapter 21. Debugging Logic in the Database
Setting breakpoints
A breakpoint instructs the debugger to interrupt execution at a specified line.
When you set a breakpoint, it applies to all connections. To make a
breakpoint apply to a specific connection only, set a condition on the
breakpoint.
❖ To set a breakpoint
1. With Sybase Central running the Debug task, display the code where you
wish to set a breakpoint.
2. Click in the gray column on the left of the window, or click a line and
press F9 to set the breakpoint. A red circle indicates each line with a
breakpoint.
721
Disabling and enabling breakpoints
You can change the status of a breakpoint from the Sybase Central right
pane or from the Breakpoints window.
on the line
DELETE FROM contact WHERE contact.id = contact.old_id
722
Chapter 21. Debugging Logic in the Database
on the line
if (max.price == price) or (price == 10)
723
Working with variables
The debugger lets you view and edit the behavior of your variables while
stepping through your code. The debugger provides a Debugger Details
window to display the different kinds of variables used in stored procedures
and Java classes. The Debugger Details windows appear at the bottom of the
Sybase Central window when Sybase Central is running the Debug task.
Local variables
Row variables are used to hold the values used in triggers. They are
displayed on the Row tab of the Variables window.
☞ For more information on triggers, see “Introduction to triggers” on
page 660.
Static variables are used in Java classes. They are displayed in the Statics tab.
The call stack It is useful to examine the sequence of calls that has been made when you
are debugging nested procedures or Java classes. You can view a listing of
the procedures in the Call Stack tab.
724
Chapter 21. Debugging Logic in the Database
The SQL special value CURRENT USER holds the user ID of the
connection.
☞ For more information, see “Editing breakpoint conditions” on page 722,
and “CURRENT USER special value” [ASA SQL Reference, page 34].
725
726
Index
727
Index
728
Index
729
Index
730
Index
731
Index
732
Index
733
Index
734
Index
735
Index
736
Index
737
Index
738
Index
739
Index
740
Index
741
Index
742
Index
743
Index
increase the cache size 165 fan-out and page sizes 433
IndAdd fragmentation 200
statistic in access plans 453 frequently-searched columns 62
Index Consultant hash B-tree 432
about 69 hash values 432
assessing recommendations 74 HAVING clause performance 436
connection state 74 improving concurrency 147
implementing recommendations 75 improving performance 427
introduction 62, 160 Index Consultant 62, 67, 69
recommendations 72 index hints 63
server state 74 inspecting 75
starting 69 leaf pages 428
stopping 70 optimization and 426
understanding 67, 70 performance 166
workloads 67, 70 predicate analysis 436
index creation wizard recommended page sizes 433
using 64 sargable predicates 436
index fragmentation 200 structure 428
index hints temporary tables 427
using 63 Transact-SQL 485
index scans types of index 432
about 407 understanding the Index Consultant 70
parallel index scan algorithm 409 unused 73
parallel index scans 409 validating 65
index selectivity virtual 68
about 427 when to use 62
indexed distinct WHERE clause performance 436
query execution algorithms 416 working with 62
indexed group by indexes in the system tables 75
query execution algorithms 418 IndLookup
indexes statistic in access plans 453
about 426 initial cache size 181
assistance 160 initialization utility [dbinit]
benefits 67 using 33
benefits and locking 119 inner and outer joins 276
B-tree 432 inner joins
can usually be found to satisfy a predicate 402 about 276
candidate 68 INOUT parameters
clustered 63 defined 672
composite 428, 433 INPUT statement
compressed B-tree 433 about 560
computed columns 65 using 561
correlations between 74 inputting
costs 67 import tools 560
creating 64 importing data 556, 561
deleting 66 importing databases 561
effect of column order 429 importing tables 563
744
Index
745
Index
746
Index
747
Index
query execution algorithms 421 matching character strings in the WHERE clause228
read 136, 144 materializing result sets
reducing the impact through indexes 119 query processing 190
shared versus exclusive 137 MAX function
transaction blocking and deadlock 114 rewrite optimization 443
two-phase locking 144 maximum
types of 136 cache size 181
typical transactions versus isolation levels 117 merge joins
uses 137 query execution algorithms 414
viewing in Sybase Central 135 merge sort
write 136 query execution algorithms 419
logging SQL statements 38 MESSAGE statement
logical operators procedures 686
connecting conditions 235 metadata support
HAVING clauses 249 installing for jConnect 39
logs Microsoft Access
rollback log 107 migrating to ASA 581
long plans remote data access 639
about 459 Microsoft Excel
Interactive SQL 467 remote data access 638
SQL functions 467 Microsoft FoxPro
LONG VARCHAR data type remote data access 640
storing XML 513 Microsoft SQL Server
LOOP statement migrating to ASA 581
procedures 682 remote access 636
syntax 669 migrating databases
losing referential integrity 98 about 581
LOTUS file format MIN function
format for importing and exporting 556 rewrite optimization 443
Lotus Notes minimal administration work
passwords 641 optimizer 400
remote data access 640 minimizing downtime during rebuilding 579
minimum cache size 181
M modifying
CHECK constraints 91
maintenance
column defaults 84
performance 165
views 58
managing foreign keys 47
modifying and deleting CHECK constraints 91
managing primary keys 45
modifying and deleting column defaults 84
managing transactions 616
monitor
mandatory
configuring Sybase Central Performance Monitor
foreign keys 97
195
mandatory relationships 9
opening the Sybase Central Performance Monitor
many-to-many relationships
194
definition of 7
Performance Monitor overview 193
resolving 22, 24
monitoring and improving performance 157
master database
monitoring cache size 184
unsupported 475
748
Index
749
Index
750
Index
751
Index
752
Index
753
Index
savepoints 696 Q
security 648
qualifications
SQL statements allowed 672
about 225
structure 672
qualified names
table names 697
database objects 215
times 698
quantified comparison test 375
tips for writing 697
subqueries 361
Transact-SQL 498
queries
Transact-SQL overview 495
about 214
translation 498
access plans 451
using 649
common table expressions 307
using cursors in 682
optimization 396, 397
using in the FROM clause 224
prefixes 331
variable result sets from 680
selecting data from a table 213
verifying input 698
set operations 253
warnings 689
Transact-SQL 491
profiling database procedures
queries blocked on themselves
about 202
remote data access 623
profiling information
query execution algorithms 407
events 207
abbreviations used in access plans 451
stored procedures and functions 207
antisemijoin 413
system triggers 207
block nested loops joins 411
triggers 207
Bloom filters 421
program variables
clustered hash group by 417
common table expression 312
duplicate elimination 415
properties
explode 421
setting all database object properties 36
filter and pre-filter 420
properties of NULL 234
grouping algorithms 416
property
hash antisemijoin 413
definition of 5
hash distinct 415
PROPERTY function
hash group by 417
about 192
hash group by sets 417
property sheets 36
hash joins 412, 413
protocols
hash not exists 413
two-phase locking 145
hash semijoin 413
prototypes
hash table scans 409
external functions 702
IN list 410
proxy table creation wizard
index scans 407
using 605
indexed distinct 416
proxy tables
indexed group by 418
about 594, 604
joins 410
creating 594, 605–607
locks 421
location 604
merge joins 414
publications
merge sort 419
data replication and concurrency 152
nested block joins 411
nested loops joins 410
nested loops semijoin 411
754
Index
755
Index
756
Index
757
Index
758
Index
759
Index
760
Index
761
Index
762
Index
763
Index
764
Index
765
Index
use and appropriate page size 174 using foreign keys to improve query performance
use bulk operations methods 170 187
use fully-qualified names for tables in procedures using group by with aggregate functions 243
697 using GROUPING SETS 343
use indexes effectively 166 using joins in delete, update, and insert statements
use of work tables in query processing 190 270
use the WITH EXPRESS CHECK option when using keys to improve query performance 187
validating tables 170 using lists in the WHERE clause 228
user IDs using OLAP 329
ASE 478 USING parameter value in the CREATE SERVER
case sensitivity 485 statement 628
default 85 using primary keys to improve query performance
user-defined data types 93 187
CHECK constraints 90 using procedures, triggers, and batches 645
creating 93, 94 using ranges (between and not between) in the
deleting 95 WHERE clause 227
user-defined functions using remote procedure calls 613
about 656 using ROLLUP 335
caching 449 using SELECT statements in batches 699
calling 657 using subqueries 355
creating 656 using subqueries in the WHERE clause 357
dropping 658 using Sybase Central to translate stored procedures
execution permissions 659 498
external functions 700 using table and column constraints 89
parameters 674 using the cache to improve performance 180
users using the EXECUTE IMMEDIATE statement in
occasionally connected 152 procedures 694
uses for locks 137 using the RAISERROR statement in procedures 502
using aggregate functions with distinct 241 using the WHERE clause for join conditions 272
using CHECK constraints on columns 89 using the WITH CHECK OPTION clause 56
using CHECK constraints on tables 90 using transaction and isolation levels 103
using column defaults 83 using transactions 105
using comparison operators in the WHERE clause using views 56
226 using views with Transact-SQL outer joins 282
using compound statements 670 using XML in the database 511
using concatenated GROUPING SETS 344 UUIDs
using count (*) 241 compared to global autoincrement 86
using CUBE 340 default column value 86
using cursors in procedures and triggers 682 generating 149
using cursors on SELECT statements in procedures
682 V
using domains 93
validating
using exception handlers in procedures and triggers
indexes 65
690
validating tables
using FOR XML AUTO 528
WITH EXPRESS CHECK 170
using FOR XML EXPLICIT 531
validation
using FOR XML RAW 526
column constraints 28
766
Index
767
Index
wizards X
add foreign key 47
XML
create database 32
defined 512
data migration 581
exporting data as from Interactive SQL 514
domain creation 93
exporting data as using the DataSet object 514
erase database 34
exporting relational data as 514
import 560
format 556
index creation 64
importing as relational data 515
procedure creation 649
importing using OPENXML 515
proxy table creation 605
importing using the DataSet object 520
remote procedure creation 613, 649
obtaining query results as XML 523
remote server creation 597
storing in relational databases 513
trigger creation 661
using FOR XML AUTO 528
upgrade database 39
using FOR XML EXPLICIT 531
view creation 55
using FOR XML RAW 526
work tables
using in Adaptive Server Anywhere databases 511
about 190
XML and Adaptive Server Anywhere 512
performance tips 167
XML data type
query processing 190
using 513
working with breakpoints 721
xml directive
working with column defaults in Sybase Central 84
using 540
working with database objects 29
XMLAGG function
working with databases 31
using 542
working with external logins 602
XMLCONCAT function
working with indexes 62
using 543
working with OLAP functions 346
XMLELEMENT function
working with proxy tables 604
using 544
working with remote servers 596
XMLFOREST function
working with table and column constraints in Sybase
using 546
Central 91
XMLGEN function
working with tables 40
using 547
working with views 54
workload capture
pausing 70
stopping 70
workloads
analyzing 68
capturing 67, 70
defined 67
Index Consultant 67
understanding 70
write locks 136
writing an EXPLICIT mode query 534
writing compatible queries 491
writing compatible SQL statements 490
768