Unit 1 Part 3 Nested Relations
Unit 1 Part 3 Nested Relations
Hongbo HE
September 1997
The author has granted a non- L'auteur a accordé une Licence non
exclusive licence allowing the exclusive permettant a la
National Libfary of Canada to Bibliothèque nationale du Cana& de
reproduce, loan, distniute or seli reproduire, prêter, distribuer ou
copies of this thesis in microfoim, vendre des copies de cette thèse sous
paper or electronic formats. la forme de rnicrofiche/film, de
reproduction sur papier ou sur format
électronique.
iii
Contents
Abstract
Résumé
Acknowledgements iii
1 Introduction 1
1.1 Relaliional Mode1 ............................. 1
1.1.1 O perat ions on Relations . . . . . . . . . . . . . . . . . . . . . 2
1.12 Operations on Dornains . . . . . . . . . . . . . . . . . . . . . 3
1.2 Object Oriented Model .......................... 3
1.3 Object Relational .........................
Mode1 4
1.4 Nested Relation Mode1 . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.1 Nested Relations . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Nesting and Umesting . . . . . . . . . . . . . . . . . . . . . . 7
1.4.3 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Thesis Aim and Outline . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Relix 12
2.1 Overview . ................................. 12
2.1.1 Domains and Relations . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Basic Commands in Relix . . . . . . . . . . . . . . . . . . . . 14
2.2 Relational Algebra ............................ 16
2.2.1 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Domain Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3.1 HorizontalOperations . . . . . . . . . . . . . . . . . . . . . . 23
2.3.3 Reduction (Vertical Operations) . . . . . . . . . . . . . . . . . 25
2.3.3 Nested Relations . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 ijoin, ujoin, sjoin are Associative and Commutative . . . . . . . . . . 27
2.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.2 Commutative . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.3 Associative . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.4 Another Approach . . . . . . . . . . . . . . . . . . . . . . . . 30
5 Conclusion 74
5.1 Surnmary ................................. 74
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Bibliography
Chapter 1
Introduction
The value in each row under a given column is atomic, i.e., it is nondecompos-
able.
0 unary operations
- projection
- selection
binary operations
0 horizontal operations
- Constant
- Rename
- Function
- If- t hen-else
0 vertical operations
- Reduction
- Equivdence Reduction
- Functional Mapping
- Partial Functional Mapping
a Polymorphism: the ability to treat different objects the same way by sending
them the sarne message, which elicits a semantically similar function in each
object.
a Class instant iation: creating different objects of the sarne general description
from the same class.
a Inheritance: extending one or more existing objects to create new objects that
share data, behavior, and methods in terms of 00 terminology.
Generally, ODBMSs are the database systems that allow data to be stored beyond
the tabular format of the relational model. They can deal with complex data stnic-
tures as in prograrnming languages. Another possible way of thinking of ODBMSs is
as an object-oriented programming language wit h persistent data, in the sense t hat
data in the progams lives beyond the life of the programs. The ability to manipulate
data and perforrn computations within one single system is the strong point that
has been claimed to solve the problem of the misrnatch between data manipulation
laquages (e.g. SQL) in the relational model and ordinary prograrnming languages.
Support for base data type extension. These include dynamic linking of user-
defmed funct ions, client /semer activation of user-defmed funct ions, secure user-
CHAPTER 1. INTROD UCTION
0 Support for complex objects. Three basic type constructors are available: com-
posites, sets and references. Full featured user-defined functions c m be imposed
on complex objects. Cornplex data types can be of arbitrary-length and have
SQL support.
0 Support for inheritance. Both data and function inheritance are supported.
Overloading is also available, as well as multiple inheritance.
0 Support for a production rule system. Events and actions are retrieved as well
a s updates. Rules are integrated with inheritance and type extension. There
are rich execution semantics for rules and no infinite loops.
I Project
Manager
Fr- Detail
PName Budget(K)
1 Sue
---
1 ~2 / 30
Manager: The name of the manager who is in charge. The data is of type string
(atomic).
have redundant values for attribute Manager, or it would have had to be split
into two different relations (Project and Detail), with a foreign key, PName.
a Nested relations allow efficient query pmcessing since some of the joins are
realized within the nested relations themselves. In our example in figure 1.1,
if information about the manager's budget needs to b e retrieved in the 1NF
representation a join must be perfomed between Manager and Detail, while no
join is needed in the NF2 representation.
Project N E S T M - ~(Project)
~ 1
Pl Joe 1
Pl Sue Sue
Pl Sam
P2 Joe
P2 Mary
N NESTAtttibutC(Relation))
"NESTAttrsUte(U = Relationn is not always true.
The case in Figure 1.3 gives an example.
As the price of the advantages over 1NF relations, nested relations pose a non-
trivial problem of data representation [Tak89].There are generdy alternative rep-
resentations of data in a nested relation, while the data is uniquely represented by a
1NF relation. This is illustrated by the foliowing example:
In left side of Figure 1.2, we have a simple 1NF relation Project on ProjName
and Member. This relation is a unique representation of a set of 7 tuples.
CHAPTER 1. INTRODUCTION
1 ProjName 1 Member 1
We can nest Project on att ribute Member as shown in the right side of Figure 1.2.
We can also nest Project on attrîbute Proj-Name, as illustrated in Figure 1.4.
Thus, it might be controversial whether or not these two relations are regarded
as the same relation. There are two different assumptions with respect to the inter-
pretation [Tak89]
:
'2. Conversely, to assume that each tuple is just a union of single values rather
than a specific object, which d o w s the identification of the two nested relations
in the right side of Figures 1.2 and 1.4 and the identification of them with the
original INF relation. Many research papers implicitly use this assumption such
as t hose proposing t ransfonnation operators [Jae82][Fis85], and t hose designing
nested relations [Ozy87][0zy89].
Significant progress has been made in the field of nested relations during the past
decade. A generalizat ion of the ordinary relat ional model, allowing relations wit h
set-valued attributes and adding two restructuring operators, nest and unnest, was
int roduced [Jae82][00M87].Fisher and Van Gucht (Fis851 discussed one-level nested
relations and their characterization by a new family of dependencies, and furthermore,
t hey developed a polynomial-time algorithm to test if a structure is a one-level nested
relation. Thomas and Fischer generalized their work on the one-level model and d-
lowed nested relations of arbitrary, but fixed depth [Tho86]. In [RKS86], Roth, Korth
and Silberschatz defined a normal form called "Partitioned Normal Form(PNF)" for
nested relation, and also defined algebra and calculus query languages for thern; how-
ever, their proofs and method were later questioned by Tansel and Gamett [Tag92].
Numerous query languages have been introduced for the nested rnodel [RKS86], and
mt.rnsions have been proposed to practical query languages such as SQL to accom-
modate nest ing [Pis861[Kor89]. Implementation of databases based on the nested
relation rnodel are dso amiilable such as of in [Sps87][Des88][Sab89]. These are either
built on top of existing relational databases, or from scratch.
Using flat relations, we can model nested relations. We can use a set of sumo-
gates to keep links between parent relations and their nested child relations.
0 We can build a nested relation query facility in the context of flat relations.
Since an attribute itself can be a relation, relational operations can be included
in domain operations.
a Chapter 3 is the user's manual on nested relations. It shows the semantics and
syntax for nested relation definitions and operations.
Chapter 5 concludes the thesis with a summary and proposds for future work.
Chapter 2
Relix
Relix is briefly described in t his Chapter. The purpose of this Chapter is to provide
readers with enough background to understand the rest of the thesis. Since al1 the
design and implementation work in this thesis follows the conceptual framework of
the existing Relix system, we will present only the subset of Relix related to this
thesis. The theoretical foundation on which the development of Relix is based can
be found in [Mer84], while the basic reference of Relix can be found in [Ld86].
2.1 Overview
Relix is a REIational database programming Laquage in U N M . It is an interpreted
language written in C. It can accept and execute commands or statements from the
command line. It cm also accept Relix commands and statements batch files.
Relix deals primaxily with two kinds of data models: domains and relations. There
are two categories of operations: domain algebra and relational algebra.
CHAPTER 2. RELIX
There are six atomic data types in Relix as shown in Figure 2.2. Note that we
also have a special data type, relation, which wiil be introduced in Chapter 3.
In Relix, we can declare the domains of relation Student as follows:
> dornain Stu-id integer ;
> domain Enter-year integer ;
> domain Name string ;
> domain Canadian boolean ;
We can also declare a relation without initialkation, i.e., a relation without any
data :
> relation Student (Sttu-id, Enter-year, Name, Canadian)
Show C o m m a n d s
0 sd! or sd!!<domainaame>
Relix will show the name, type and other information associated with al1 do-
mains in the database or the specified domain. For example:
a sr! or sr!!<relationname>
Relix will show the narne, degree and other information of al1 relations in the
database or the specified relation. For instance:
Relix will show all relations and their domains in the database or the specified
relation and its domains. For example:
a pr! !<relationname>
dd!! <domainname>
Relix will delete the specified domain. If it is still in use, Relix will give an
error message and the domain will not be deleted.
> dd!!Year
will delete domain Year, if it is not in use.
q!
2.2.1 Projection
Projection is as operation on the attributes of a given relation. The results of a
projection is a relation whose attribut- are the spetified attributes in the projection
list. Duplicate tuples in the resulting relation are removed. For example, we can
project the Name of Student relation as follows:
CHAPTER 2. RELIX
S tu-name
- - d e -
Jin
Joe
Sue
Selection is an operation on a relation to select t uples that meet the condit ion specified
in the selection clause, which is called T-selector(tup1e selector). We can do the
following selection to extract t h e student information about who is a Canadian.
> Ca-stu < - where Canadian = tme in Student ;
or
> Castu < - where Canadian in Student ;
We can combine projection and selection in a single statement. First Relix will
do selection on the input relation based on the selection clause, then do projection
on the output of the selection. We can extract the Stuid numbers of students who
are Canadian using the following statement:
CHAPTER 2. RELIX
-Ca-stu-id
-----
Stu-id
------
9546900
9602324
2.2.3 Joins
There are two classes of join operations in Relix: p-joins, the family of set-valued set
operat ions; and o-joins, the family of logical-vdued set operations [Mer84].
p-joins are derived from the set operators such as intersection, union, difference, etc.
The p-joins on two relations, R(X,Y) and S(Y,Z), are based on three parts:
A
0 crnter = {(x,y, 2) 1 (2,y ) E R and (y,=) E S}
A
0 left wing = {(x,y, DC)1 (x, y) E R and V ~ ( yz, ) S}
A
a right wing = { ( D C , y , r )1 ( y , z ) S~a n d V x ( x , y ) $ R )
We will explain these three basic p-joins in detail in this section. The two relations
in Figure 2.3 are used to illustrate the operations:
0 The most used p-join is the natural join (ijoin or natjoin), which gives us the
center part of the operand relations. It combines tuples of the two relations
that have equal values on the join attributes. Thus, it is the intersection of the
two relations on the join attributes, which gives us ijoin.
CHAPTER 2. RE:LIX
Courses
-----------
S tu-id c-name
----------- - - c - - - - - - c -
The following Relix st atement performs a natjoin between relation Student and
relation Courses.
a The union join (ujoin) is an operation that is a union of the set of tuples from
the natural join, together with the tuples from the relations of both sides that
axe not equal to each other in the join attributes, and the missing attnbutes
CHAPTER 2. RELIX 20
axe filled up with DC1 null value. It gives us the union of the lefl, center, right
parts of the operand relations.
-----------------
9546900 Joe Physics
9576701 DC Math
9602324 Sue History
9602324 Sue Math
9701087 Jin DC
9702340 Jin DC
* The symmetric difference join (sjoin)is the set of tuples from the relations of
both sides that are not equd to each other in the join attributes, the rnissing
attributes are filled up with DC null value. It gives us the union of the lefi,
rignt parts of the operand relations.
- - œ - - - - - - - - œ - - - œ
9576701 DC Math
9701087 Jin DC
9702340 Jin DC
o-joins
The family of O-joins are based on set cornparis oper ors. In opecat ions, the tuples
in each of the operand reiat ions are grouped such that for each group, al1 the non-join
attributes on both sides axe identical. The set comparison operator is then applied
to the Cartesian product of the groups. The values of the non-join attributes of the
comparing groups are accepted if the specified set comparison on t h e join attributes
is satisfied.
There are five a-joins:
Student Class
-----------
Name Course
----------- - - - - - - - - -Room
Course
--
Joe Ma th Math 286
Joe Physics Physics 286
Sue Physics Chemistry 302
Physics 3 12
-----------
Jin Math - - - c - - - - - - -
To answer Eoliowing query: Find students and the classrooms such that the courses
the student has taken is a subset of the courses which are given in this classroom.
CHAPTER 2. RELIX
--------
Joe 286
Jin 286
Sue 286
--------
Equal Set
Not Supenet
Not Subset
renaming
let stuaame be name;
if-t hen-else
let Grade b e if Mark > 60 then "Pass" else "Fail";
Al1 above domains defined are virtual domains. For example, we can actualize
Crade as following:
> CRADES < - [ Student, Crade ] in MARKS
MARKS GRADES
------- --------
Name Mark Nante Grade
------- -Joe- - - -Fail
---
Joe 50
Jin 80 Jin Pass
Sue 90 Sue Pass
-------
Simple Reduction
Simple reduction produces a single result from the values from al1 tuples of a
single attribute in the relation (Mer84j. The operator in simple reduction must
be both commutative and associative, such as plus (+), multiplication (*). For
exarnple:
Transcript
-----------
Name Dept Grade (Total)
-----------
Joe CS 85
Jin CS 90
Sue EE 80
Weny ME 75
-----------
Equivalence reduct ion is like simple reduction but produces a different result
[rom different sets of tuples in the relation. Each set is characterized by al1
tuples having the same value for some specified attributes - an 'equivalence
class" in mathematical terminology [Mer84]:
Transcript
-----------
Name Dept Grade ( Subtotal)
i - - - - - - - - - -
Joe CS 85 175
Jin CS 90 175
Sue EE 80 80
Weny ME: 75 75
techniques.
2.4.1 Definition
For relations, R(X, Y) and S(Y,Z), these three sets of tuples are each defined on the
attributes(or attribute groups) X, Y, 2.
We first define three disjoint sets of tuples which are set operations between R
and S [Mers$]:
a
1. center = { ( x . y , z ) 1 ( x , y ) E Rand ( y , ~ E
) S}
a
'2. 1eft uiny = {(r,y , DC) 1 ( r ,y ) F R and V s(y: tj $ S}
A
3. right wzng = y , z ) 1 (y,z) E S and Vx(x, y)
{(DC, R}
a
1. R ijoin S = center
A
2. R ujoin S = Ieft uring U center U right ving
A
3. R sjoin S = left wing U right wing
2 A.2 Commutative
By definition, an binary operator 8 is commutative iff A 8 B = B O A.
Remark 1: R ijoin S = S ijoin R.
Proof:
R ijoin S = {(x,y, z) 1 (x, y ) E R and (y, z ) E S} (from definition)
*
R ijoin S = { ( r , y , 2) 1 (z, y ) E S and (9,s) E R} (from the commutativity of
and)
*
R ijoin S = S ijoin R
Remark 2: R sjoin S = S sjoin R.
Proof:
R sjoin S = {(2,y, DC)1 (x,y ) E R and tf z(y,:) y, t) 1 (y,z) E
$ S } u {(DC,
5' and V x ( z , y ) $Ri} (from definition)
*
-
R sjoin S = {(z, y, DC) 1 (-,y) y, x ) 1
E S and V ~ ( yx,) 6 S} U {(DC, (y, x ) E
R and V z ( z , y) S ) (from symmetry and the commutativity of U)
R sjoin S = S sjoin R
Remark 3: R ujoin S = S ujoin R.
Since R ujoin S = (R ijoin S) U (R sjoin S) (from the definition)
And from Remark 1 and Remark 2, the proof is trivial.
2.4.3 Associative
By definition , an b i n q operator 19 is associative iff ( A 8 B)0 C = A O ( B 0 C)
Suppose we have 3 relations, R(X,Y), S(Y,Z), T(Z,W)
Remark 4: (R ijoin S) ijoin T = R ijoin (S ijoin T)
Proof:
-
( R ijoin S) ijoin T = ((2,y, z ) ( ( x ,y ) E R and ( y , z ) E S} ijoin T (from the
definition)
-
( R sjoin S ) sjoin T = ( l e f t ~ i n g ( ~ U, sr)i g h t ~ i n g ( ~sjoin
6 S ) U {(DC,y, r ) 1
(y, z ) E S and V x(x, y ) 4 R})sjoin T (from definition)
*
( R sjoin S) sjoin T = { ( x ,y, DC,DC)1 ( 2 ,y) E R and Vz(y, z) @ S and Vw(DC,w )
T ) ü {(DC,y,z,DC) 1 (y,z) E S and Vx(x,y) 6R and V w ( r , w ) 6 T} U
D C , r , w ) / (qw)E T and
{(DC, V y(y,t) 4 S and V x(x,DC) 6 R ) (from def-
inition)
In the same way, w e can get:
R sjoin (S sjoin T) = {(x,y, DC,DC)1 (x,y ) E R and Vz(y, z ) 6 S and Vw(DC,w )6
Tl U z, DC) I (Y,z)
{(WY, E S m d 'd z(z,Y) 4 R zmd v ~ ( 2 1 6 T} U
~ )
CHAPTER 2. RELIX
3. for R = Ri sjoin R2.. .R, and for some tuple x, if Xi $ ,Y2 $ .. . $ *Yn$ =
0,-x E R."
From characteristics of 8, we can conclude that if x appears odd times in
relations Ri. .. &, t hen x f R.
This chapter describes how to define and manipulate nested relations in Relix. Sec-
tion 3.1 explains the basic concept of nested relations in Relix and presents the
ini tializat ion of nested relations. Section 3.2 illustrates the operations that can b e
imposed on nested relations.
The above Relix commands are used to initidize the sample nested relation in
Figure 3.1.
TEST
Figure 3.1: Sample nested relation: schema tree and value table
W e have three regular domains A, B and C, which are defined as integers, and a
nested domain S,which is defined upon A and B. When w e declare TEST,it includes
the nested dornain S. Relix will consider S as a domain as well as a relation.
The data in S is stored in another relation outside the parent relation TEST,
which has the same name as S. References to the data (cded RELATION .id) are
stored in attribute S of relation TEST. However, this method of implementation is
lilrgely transparent to users, who manipulate the attributes of nested domains as if
CHAPTER 3. USER'S MANUAL ON NESTED RELATIONS
Any Relix operation that displays an attribute of type RELATION will display
the attribute as a number. The actual data of the attribute is printed below it as
a separate relation whose .id field links it to its parent. In above print command,
TEST and its nested domain S are printed out. In child relation S, .id is mapped to
attribute S of TEST.
CHAPTER 3. USER'S MANUAL ON NESTED RELATIONS
Note in the following sections, we will use the conceptual format as shown in
Figure 3.1 to show the example, while in Relix, t h e actual format will be as in pr!,
Le. as shown in Figure 3.2.
So far, we have only implemented two levels of nesting. Future work is needed to
gain multiple level nesting.
Narne
-
Name 1 Salarv 1 Commit
Pat 65 PADS Sal 35 PODS
Paul 55 PODS Sue 38 PODS
Pully 50 SIGM
Pat 65 PADS Sandy 36 IEEE
Paul 55 PODS Sharon 35 PODS
Piree 54 IEE Sam 40 PODS
Figure 3.4: The nested relation, Engineering Department, over the schema in Fig.3.3
CHAPTER 3. MER'S MANUAL ON NESTED RELATIONS
Simple Reduction
Recall that we already proved that ijoin, ujoin and sjoin are al1 commutative and
associative (see Section U )we
, cm now extend the reduction operations to ijoin,
ujoin, and sjoin.
We start with the following example: Suppose we want to find al1 the professors
in the faculty of engineering, we can do the following query:
> let EngPmfbe red ujoin ofProfessor
> AllEngProf <-[EngPiof]inFactEng
> pr!! AllEngProf
EngProf
1 1
t
~nested-domainname>
<binary-operator> := 'ijoin' 1 'ujoin' 1 'sjoin'
Now we introduce the universal professor, who works in every unit of an education
organization.
CHAPTER 3. USER'S MANUAL ON NESTED RELATIONS
I UEP I
1 Pat 65 PADS 1
Figure 3.6: Al1 uni versal engineering professors
Pat 65 PADS
Ping 57 MEE
Piree 54 IEE
Pully 50 SIGM
Equivalence Reduction
Like simple reduction, equivôlence reduction is extended to ujoin, ijoin and sjoin as
well.
Query: Find the professors by each building.
PbB
Pat 65 PADS
Ping 57 MEE
Query: Find the universal professors by building. (we introduced the idea of a
universal professor in the last section. Here a universal professor in each building
works in each department of the building)
I UBP I
Building UnivBuiidProf
Nume 1 S a h y 1 Commit
MC Pat 65 PADS
Paul 55 PODS
I "" I Fb, 65
57 MEE
PADS 1
Figure 3.9: Universal Professors in each Building
Query: Find the professors in each building who are assigned odd department
positions in that building.
1 OBP 1
1 Building 1 PureBuilProf I
MC Piree 54 IEE
Pully 50 SIGM
Figure 3.10: Professors who are assigned odd positions in the building
Syntax:
CHAPTER 3. USER'S MANUAL ON NESTED RELATIONS 40
Binary Operations
Binary relational operations take two relations as operands and produce a relation as
a result. We extend those operations to nested domains, and take two nested domains
as operands and produce a nested domain as a result, which itself is a relation data
t~pe.
Query: Find al1 the staff of the faculty of engineering.
> let Staff be Professor ujoin Secreiary
> FaetEdtafl < - [ D e p t , Building, Stafl] in FactEng
> pr!! IiactEngSLafl
Pat PADS
Piree iEE
Sandy EEE
Sharon PODS
Sam PODS
Pat PADS
Ping MEE
Sandra MEE
SY~ MDS
General Operation
We can also embed general relational expressions into dornain algebra. This is c d e d
general operation. "Ceneral" here means more general t han the operation we intro-
duced before in this Chapter. However, it is not arbitraxily general. We will show
t h e limitations irnposed o n it at the end of this Chapter.
In the Faculty of Engineering, rich professors are professors whose yearly salary
equals or exceeds 55 K. We have the query: Find the rich engineering professors
together with their salary and department. The following expression will answer the
query :
> let RichProf be "< [ Name, Salary ] where Salary>=55 in Pro/essor Y;
> RP < - [ Dept. RichProf 1 in FactEng,
> pr!! RP;
Dept RichProf
Name 1
Salary
CS Pat 65
Paul 55
EE Pat 65
We can make more complicated generd operations. For example, we can do sjoin
on different domain names in t wo nested domainrelations.
CHAPTER 3. USER'S MANUAL ON NESTED RELATIONS 43
Query: Find professors and secretaries such that the secretary works for al1 the
cornmittees to which the professor belongs.
> let Pnarne be Name
> let Srrame be Narne
> let ProfSew be
( [ Pname, Cornmil] in P~ofessor)sub ( [ Sname, Commit ] in Secretary) >"
> PSC < - [ Dept, ProJSecr ] in ED
> pr!! PSC
PSC
Rept
1
CS Paul Sa1
Paul Sue
EE Pirre Sandy
ME Ping Sandra
as string, yet during the actualization, the Relix statement included in the string will
be evaluated.
CHAPTER 3. USER'S MANUAL ON NESTED RELATIONS
Implementation of Nest ed
Relations
This chapter deals wit h the implementation of nested relations. Section 4.1 gives an
overview of the implementation of Relix. Section 4.2 describes how nested relations
are represented and declared. Section 4.3 illust rates the implementation of nested
relation operat ions.
a .type is the data type of the domain. There are 6 atomic data types (see
Figure 2.2)
'File names beginning with a period (.) are UNIX hidden files which are not normaiiy h t e d
under the UNIX List directory command.
?In Relix convention, the names which begin with a period (.) are system names.
CHAPTER 3. IMPLEMENTATlON OF NESTED RELATIONS
1. .nst (sup-name,
.sub-name)
The .nst system relation contains information about parent relations and their
child relations.
the intemediate code and calls particular C functions to perform the operations.
Figure 4.1 summarizes the main fiow of Relix.
Load sy&m 1
.- - - -
-
I
Wait for input from the user
1
1
'-.Le--LILL--I-code-----------------l
I Interoreter Module I
Y
I
lnterpret I a d e I I
I
The parser performs syntax analysis and fmds that the above statement fits the
following grammar rules.
domain-declaration:
DOMAIN-DEC ident ifer
( translater( DOMAIN-DEC);)
TIPE
( translater( IDENTIFIER) ; translater( TYPE) ; 3
Actions in Yacc are C codes enclosed in a pair of curly brackets. The t ransla-
tor function is a C function which performs various tasks according to the actual
parameters. The tasks of the translater function include:
0 generating 1-code
For instance, the cal1 'translator(1DENTIFIER)' pushes the value of the identifier
onto the scalar stack.
Some of the parameters produce 1-code. For example:
paramet er Lcode
DOMAINDEC global-dom
TYPE push-name a domain
'a' is a string obtained by popping an item fiom the scalar stack. The 1-code for
the exarnple statement is shown below:
global-dom /*set t h e flag n o t i f y i n g that the following
declared domain i s a global domain. */
a
push-name /* Push the next s t r i n g onto the stack.*/
long
push-name
a
domain /* Pop a from t h e stack, and actually declare
a as an integer domain. */
halt /* Update system r e l a t i o n s and return. */
The comments on the right hand side describe the interpreter actions for the
corresponding 1-codes. The interpreter maintains a stack for storing and retrieving
operaads. The 'push-name' pushes an operand onto the stack. The 'domain' is a
collection of C functions that the interpreter needs to cal1 with predefined arguments,
which are obtained by popping the operands from the stack. Note that ' h a l t ' is
required at the end of the 1-code for the interpreter to stop execution.
Note that the 'Actz le of dornain a is /aise, which means that a is a virtual
domain, and the following Relix statement requires it to be actualized.
> ACT < - [ a ] in TEST;
The 1-code for the example statement is shown below.
In above 1-code, when the iotepreter reads project, it will call a C function
*projeet()'t o perform the actual projection. In turn, porjecto will call yet another
lunction 'actionizeifany-virtual()' to actualize the virtual domains ('a' in this case).
The algorithm for routine pmject() is as follows:
project ( list-R, r-name)
where listR is a lznked list which contains the domains to be projected and
r-narne is the name O/ the relation on whzch the domains are t o be projected.
1. Traverse the attribute list and find i f there are any virtual domains.
(6) Actualize the virtual dornain value according to the definition of the virtual
dornain.
CHAPTER 4. IMPLEMENTATION OF NESTED RELATIONS
(a) IistR, which points to a list which includes only one item, 'a'.
In actualire-if-any(), the sytern Jnds that 'a' is a virtual attribute, and there-
afler, domain a is actualized by asszgning the value of 5 t o the attribute 'a' of
every tuple in TEST.
Artunlire-ii_ong(l returns the name of the temporary relation to project (1. which
in lurn projects the a ' domain and retvrns the resuit to systern.
> relation S ( a , 6 );
We have already explained the 1-codes of domain declaration (see Section 4.1.2). The
1-codes of the relation declaration is as follows:
push-name
no-cp-ln /* S e t the flag that only declare,
no data input*/
push-name /* Push the next string onto the stack.*/
push-name
push-name
b
push-
/* number of domains */
push-name
S
relation /* Pop domain list (a and b) from t h e stack,
pop S from the stack, and declare S a s a
relation */
halt /* Update system relations and return. */
To declare a relation data type, we combine the above two cases and add the
following grammar to yacc:
<nested,domain-dechration> := 'domain' <identifier> <domainiist>
For instaace:
CHAPTER 4. IMPLEMENTATION OF NESTED RELATIONS
> domain S ( a , b );
The 1-codeare dso combined from above:
push-name
no-cp-ln
push-name
push-name
. id /* Add a system domain .i d t o refer to
the parent r e l a t i o n */
push-name
a
push-name
b
push-count
3
relation
global-dom
S
push-name
relation
push-name
S
domain
end-don-code
halt
Figure 4.2: Cornparison of the nested domain declaration with the regular domain
declaration and the regukr relation declaration
CHAPTER 4. IMPLEMENTATION OF NESTED RELATIONS 57
Each nested domain has its declaration entry in both .dom system table and .ml
system table. The .type in table .dom of aay nesteddomain, i.e., relation data type, is
set to a constant 'RELATION', which equals 11 in the current version. The following
entry in .dom table is for the nested domain S
.dom (.domname, .type)
S 11
The following entry in .rel table is also for the nested domain S
S O O O
Because nested domain S is a relation itself, its information and that of its domains
are stored in another system table .rd. The following entry is for S:
.rd ( .relname, .domname, .dom-pos, .dom-count )
S .id O -3
S a I -3
S b 2 -3
Note that S has three domains, among which .id is added by t h e system in order to
refer it to the parent relation.
S also has an entry in the system table .nesLdom.
.nest ,dom ( .domainname, .domainref )
S O
Init ialization of relations can be achieved by supplying the initialization data directly
on the command line:
1. Parse the relation identifier and parse the domain identijiers. In the above case,
:
'Sample ', 'a ', and 'b then create a file named Simple '.
2 Parse the constants, and Save the constants lo jile Simple '.
since we include a nested domain S here, we need to revise the algorithm to achieve
the desired effects.
1. Parse the relation identifier and the domain identijiers, and record the nested
:
subrelations (nested domains). Then create a jile named 'Test also create files
according to subrelations, in this case we have 5 '.
2. Purse the constants. When we rneet a curly brace '{ ', we create a surrogate
to the parent uttribute, and put the conesponding real constants into the cor-
responding subrelations. For example, for { (1 $), (8, y)}, the surrogate is O and
!OP the surrogate is 1. Thus,
{(6,5),(4,9)},
4.3 Operations
In this section, we present the implementation for operations on nested child relations
. (nested domains).
CHAPTER 4. IMPLEbIENTATION OF NESTED RELATIONS
Scalar attributes' data types are atomic as surnmarized in Figure 2.2. Recall that
in Chapter 2, we already listed that what scalar operations can be conducted on
both simple reductions and equivlant reduction. Now we will show how they are
implemented by using an example of '+', the add operator.
Suppose we have a database order as in Figure 4.3.
---------------------
Customer Product Amount
- - - - - - - - - - - - - - - - - - c _ c
Ann
Ann
Ping
Sam
Ln order to gain the total order Arnount of al1 the customers, we can use our 'red
+' operator, and impose it on the domain Amount.
> let Total be red + of Amount ;
Domain Total is kept in the systern as:
CHAPTER 3. IMPLEMENTATTON OF NESTED RELATIONS
Whenever a Relix statement wants to include Total, the system will cal1 Actual-
ise-$-nny() to actualize it.
As we can see, Total is defined on Amount.
The algorit hm is as follows:
1. Initialire an accurnulator accordhg to Amount (In this case, its data type is
long).
2. Scan through each tuple O/ the relation Order. Extract the value of Amount,
add it to the accumulator (Recall that operator of Total is '+y.
9. rlsszgn the vulue in Le accumulator to the Total attribvte of each tuple.
Thus we can actualize Total and the result is shown in Figure 4.4.
- - - - - - - - - - c - - - - - - - - - -
- t-orner
-Cus - - - - - -Product
- - - - - - - - Amount
---- (Total)
Ann W 10 100
Ann X 40 100
Ping M 20 100
Sam Y 30 100
Furthemore, we would like to know the total amount of the products each cus-
torner ordered. The follwing ReIix statement can help us to perform this task:
CHAPTER 4. IMPLEMENT.4TION OF NESTED RELATIONS
We can see in the system data structure that CusTotal actually has an item called
by-list, which includes Customer, and that the resulting CusTotal will be based on
this list.
With following steps we can actualize Cus Total:
9. Scan through tuples of Order, i j the tuple's value is kept the same in uttribute
Customer, add it to the accu.rnulator, othenuise append the value O/ the accu-
rnulator to the previous tuples, and reset the accumulator.
In this section, we will present the general aigorithms of reduction on nested attributes
first and then show some examples.
The operator of reductions on nested attributes f d s in one of the following groups:
CHAPTER 4. LMPLEMENTATION OF NESTED RXLATIONS
(CusTotal)
(simplereduction equivalence-reduction)
red-ijoin equivijoin
red-ujain equiv-ujoin
redsjoin equivsjoin
General Algorithm
0 Simple Reduction
2. In the parent relation level, tue assign each tuple in &heposition of the
operand domain a constant O. For simple nduction, the valve of this at-
tribute should have the same value for al1 tvples in the relation.
2. In the nested relation leoel, acco~dingto the operator, do ujoin, ijoin and
sjoin with the subrelations (which are actually stored in the same physical
table).
(a) ujoin: Project al1 the attributes except .id. The obtained result is the
required ujoin operations on those sub-relations. Then, append a new
.id to it, in oîder to keep links lmth the parental relation. The value
is a constant 0.
CHAPTER 4. IMPLEMENTATION OF NESTED RELATIONS 63
(b) ijoin: Sort the table according to the number of tuples in each sub-
relation, select the sub-relations one by one according to the value
O/ .id and do ijoin on them. in this way, we can improve the join
eficiency, since during the join procedure, the result might be empty
befon we reach the last subrelation.
(c) sjoin: The algorithm is the snme as ijoin, except we do not need to
sort the table.
Equivalence Reduction
Inside Reduction
Examples
In Figure 4.6, we have a relation Order-book with domains Customer and Order,
which is a subreIation with domain Product.
CHAPTER 4. IMPLEMENTATION OF NESTED RELATIONS
Order-Book Order
----------- ---------
C u s torner
- - - - - - 3 - - -
Order
- -.id- - - -Produc
- - -t-
The first Relix statement above Ends al1 the products ordered by the customers.
The second one finds products which are ordered in each individual order. The third
one finds al1 the products ordered in every order by each customer.
Tu actualize AllProduct , we can run the Relix statement:
> Order-Bookf < - [Customer, Alfproduct ] in OrderBook ;
System m i n g flow:
3. In the nested relation leuel, i.e., ALlProduct, the operator is red ujoin and the
operand is Order. We project [ f ~ o d u c t from
] Order, and append a new .id to
CHAPTER 4. IMPLEMENTATION OF hrESTED RELATIONS 65
each tuple of the new obtained relation, in order to keep links with AllProduct
in OrderBook. Thus we have a new subrelation AllProduct.
AllProduct
Anri
Ping
Sam
O
O
0
,---- - - - - ( :
]
;
r
1
O M
wX
- - - - - - - - - O -
O Y
Y. In the nested relation level (Le., IProduct) the operator is red ijoin and the
operand is Order. We do ijozn between the diffeient set of Prodaet values ac-
cording to .id. They a n {(W), (X)},{ ( W ) ) ,{(M),(W))
and {(Y),(W)} respec-
tiuely. The result is { (W)}. In order to keep links with IProduct in OrderBook,
CHAPTER 4. IMPLEMENTATION OF NESTED RELATIONS 66
we append a new .id to each tuple of the new obtained relation. T h w we have
a new subrelation IProduct.
To actualize CustProduct, the following Relix statement can satisfy the require-
ment:
> Order-Book3 < - [Customer, CuslProduct] in OrderBook ;
System m i n g flow:
Order-Book2 CustProduct
-----------
Custorner CustProduct
1. In the parent relation level, copy the value /rom one of the operands' to the new
domain.
2. In the subrelation level, cal1 ReZix again to obtain the new subrelation.
3. Join back the obtained subnlation to the parent relation on subrelation's .id
uttribute un'th parental relation3 attribute.
Example
In Figure 4.10, we have relation OrderBook with domains OldOrd, Customer and
NezuOrd. OldOrd and NewOld are aested domains.
_
œ = = - - W
Y
z & * -
- ,==2 Ann
Ping
1
2 - - - - - -' 21
*
-0 X
Z
W
X
y/
H M
0 @
,3 Sam 3
. -3
=-
- - - - - - - - - - - - - -:--
-
M
W
-
W
- m e - - -
3' -------
-3 Y
Suppose we have:
Copy OldOrd to Order. This way, we can keep a set of surrogates of Order in
parent relation OrderBook.
Gall Relzx again to get Order, i e . , run "Order < - OldOrd ujoin NewOrdn in
Relix. Since both OldOrd and NewOrd have same attributes, .id and Product,
we do ujoin on them to get Order.
Join back the obtained subrelation to the pannt relation on subrelationk .id at-
tribute with the parent relation's attribute Order. GOrderBook < - OrderEook
C'HAPTER 4. IMPLEMENTATION OF NESTED RELATIONS
Order-Book5 Order
- - - - - - - - O - -
--------
- - - - - - - -Order
Custorner --- .id
- - - - - O
Product
- -
General Operation
General Operations are stored as strings when they are declared. Suppose we have
the relation as shown in Figure 4.12 and the following query:
> let BigOrd be "< [ Product ] where Amount > 8 in Order >" ;
Order-Book Order
Arin
Anil 1 - - - -- -- - 0
O *=----O
- W
X
9
6
Ping 2 - - - - - -1 Z 10
Sam
- - - - - - O - - . - -
3 =- \
-
-
'2
-
=3
-3
M
Y
W
12
10
7
2. Extract the relational statement /rom the string, parse it (the parser will be de-
scribed in next section); the string wdl be altered /rom "[Product] where A mount
> 8 in Order" to "[id, Product] where Arnovnt > 8 in Order".
3. Cal1 Relzz to get the resulting subrelation, YBigOrd < - [.id, Product,] where
Amount > 8 in Order".
4. Join back the resulting subnlation &th the parent relation on .id. Y&de~Book
< - OrderBook [BigOrd ijoin .id] BigOrd".
5. Update system tables.
BigOrd
-----------
-----
Customer - -BigOrd
--- -,id
- - Product
- - - - -Amount
---
Ann 0 - - - - - -O W 9
Ann 1 ------1 2 10
Ping 2 -----*2 M 12
- - - - - - - - - - - - - - - - 3 _ - - -Y- - - - - - -
Sam 3 1O
Parser
In general domain algebra operations. we can write regular relational expressions with
some limitations. i.e.. we can not include vertical operations in the quoted relational
expression.
Since w e cal1 Reiix again to get the resulting relation, we need to preprocess the
statement. W e bbuilcl a small parser to preprocess the expression.
For example. *[Produet] where . - h o u n t > 8 i n Order' will beconie *[.id. Product]
where Amo~tnl> 8 i n Order'. T h e automaton of the parser is shown in Figure 4.14.
Suppose we have -A [a ijoin b] BE. The flow of its automaton is:
1. The automaton reads 'A'. It stays at the start. The output is "A'.
2. The automaton reads '['. It goes to state 1. The outputs is "A [".
4. The automaton reads 'ijoin'. It stays at state 1. The output is -A [ a, .id ijoin"
5. The automaton reads 'b'. It stays at state 1. The output is "A [ a, .id ijon b"
6. The automaton reads '1'. It goes back to the start. The output is &A [ a, .id
ijoin b. .idln
CHAPTER 4. IMPLEMENTATION OF NESTED RELATIONS 72
7. The automaton reads 'B'. It stays at the start. The output is 'A [ a, .id ijoin
b, .id] B"
S. The automaton reads EOF. It stops and returns the obtained output.
CHAPTER 4. IMPLEMENTATZON OF NESTED RELATIONS
w
her than 'ï' other than
Algonthm:
Figure 4.14: The parser to parse the ernbedded general relational expression
Chapter 5
Conclusion
Nested relations have been explored thoroughly in past decades, with the major re-
search direct ion focused on nesting and unnesting [Jae82][Fis851[K0r89][TakSS]. In
Our approach, we build nested relations upon Bat relations. We show that Bat rela-
t ions are powerful enough to model nested relations and to facilitate nested relation
queries. The purpose of this thesis is to begin to integrate nested relations into a ce-
Iational database programming language (Re1ix)by integating the relationai algebra
into the domain algebra.
5.1 Summary
We built our nested relation model upon the original Relix database model. Relix is
powerful enough to support nested relations. No modifications have been made to
the original database engine itself. However some extensions were made to facilitate
the process of integration and to provide new features.
a A new system attribute .id has been added to Relix , which provides a way of
linking the parent relation to its included nested relations.
Our irnplementation showed that Relix is powerful enough to include nested rela-
tions, and that it is convenient to add nested relations to the system. The relational
operations, such as ujoin, sjoin, ijoin, which are added to dornain operations, function
well.
However, the surrogate mechanism we used is a bit simple, and we bave not been
able to include more information in the surrogates except to use it to keep links
between nested child relations and the parent relation. No large-scale tests have been
done, since it is beyond the scope of this M.Sc. thesis.
a Implementing multiple nesting and recursive nesting. 'To date, we have oniy im-
plemented one level of nesting, which provides a prototype for multiple nesting.
Theoretically, it is possible to build infinite levels of nested relations.
a F d y integrating the relational algebra into the domain algebra. Only a part
of relational algebra has been integrated into domain algebra to date. Further
work can be done on functional mapping and partial function mapping on nested
relations.
[CodiO] E. F. Codd. A Relational Model of Data for Large Shared Data Banks.
Communications o j the ACM 13(6). Oct. 1970. pp.337-387
=
-
.---- 0
Rochester. NY 14809 USA
Phone: 71ô/48~-O300
Fax: 716228&5989