Summary-Booklet - Jenna Tutorials
Summary-Booklet - Jenna Tutorials
SEMESTER 1, 2014
The combination of the primary keys of the participating entity types forms a super key of a
relationship
Relationship Set Schema , relationship name ,role names ,relationship attributes and their
types ,key
Key Constraints if for a particular participant entity type, each entity participates in at most one
relationship, the corresponding role is a key of relationship type eg employee role is unique in workIn
Participation constraint if every entity participates in at least one relationship participation
constraint holds - A participation constraint of entity type E having role in relationship
type R states that for e in E there is an r in R such that (r) = e. (representation in E-R diagram ;thick
line)
Cardinality Constraints Generalisation of key and participation constraints, A cardinality
constraint for the participation of an entity set E in a relationship R specifies how often an entity of
set E participates in R at least (minimum cardinality) and at most (maximum cardinality).
Weak entities An entity type that does not have a primary key eg child from parents payment of
loan. The primary key of a weak entity type is formed by the primary key of the strong entity type(s)
on which the weak entity type is existence dependent, plus the weak entity types discriminator.
Constraints On ISA Hierarchies
- Overlap Constraints disjoint : an entity can belong to only one lower-level entity set
Overlapping : an entity can belong to more than one lower level entity set
- Covering Constraints total : an entity must belong to one of the lower level entity sets
- Partial (the default) An entity need not belong to one of the lower level entity set
2.
Join:
You can join two or more table using the attribute conditions
Type of join: NATURAL JOIN, INNER JOIN and OUTER JOIN
R NATURAL JOIN S
R INNER JOIN S ON <join condition>
R INNER JOIN S USING (<list of attributes>)
R LEFT OUTER JOIN S
R RIGHT OUTER JOIN S
R FULL OUTER JOIN S
4. Set operator:
The set operations union, intersect, and except (Oracle:
minus) operate on relations and correspond to the relational
algebra operations union, intersect and except.
Example: (select customer_name from depositor)
union
(select customer_name from borrower)
The following checks for each student S whether there is at least one entry in the Enrolled table for that student in INFO2120:
Grouping
A group is a set of tuples that have the same value for all attributes in grouping list
NOTE : an attribute in the SELECT clause must be in the GROUP BY clause as well
Relational Division
Definition
Table Decomposition
A decomposition of R consists of replacing R by two or more relations such that: Each new relation scheme contains a
subset of the attributes of R (and no attributes that do not appear in R), every attribute of R appears as an attribute of
one of the new relations, and all new relations differ. Example: R ( A, B, C, D ) with FDs: {A -> B D and B -> C}.
Overall Design Process: Consider a proposed schema | Find out application domain properties expressed as
functional dependencies | See whether every relation is in BCNF | If not, use a bad FD to decompose one of the
relations; start with partial dependencies (Replace the original relation by its decomposed tables) | Repeat the above,
until you find that every relation is in BCNF.
Making it Precise
It is essential that all decompositions used to deal with redundancy be lossless!
Dependency-preserving: If R is decomposed into S and T, then all FDs that were given to hold on R must also hold
on S and/or T. (Dependency preserving does not imply lossless join & vice-versa!)
Must consider whether all FDs are preserved. If a dependency-preserving decomposition into BCNF is not possible
(or unsuitable, given typical queries), should consider decomposition into 3NF.
Candidate Key: Main Idea -> only allow FDs of form of a key constraint. Each non-key field is functionally dependent
on every candidate key. Candidate Key Identification: Identifying all FDs that hold on our data set | Then reasoning
over those FDs using a set of rules to on how we can combine FDs to infer candidate keys | Or alternatively, using
these FDs top verify whether a given set of attributes is a candidate key or not.
From FDs to Keys: Candidate keys are defined by functional dependencies | Consequently, FDs help us to identify
candidate keys. From the Attribute Closure to Keys: The set of Functional Dependencies can be used to find
candidate keys.
e.g. triggers. Lets say you have a database containing many Varchar data types and you dont want to rewrite the
same thing again and again then you can use domain constraint to create a varchar which will be available to all the
tables in the database and a check will be made to verify that it is within limit. E.g.
CREATE DOMAIN domain name check (value in ());
DEFRERING constraint let the transaction be completed first then check the constraint and NON-DEFERABBLE check
the constraint immediately afterwards every time the database is modified after the database gets modified.
ASSERTIONs are schema objects and are static integrity constraints that will make the database always satisfy a
condition. E.g. CREATE TABLE student { Sid INTEGER PRIMARY KEY name varchar};
CREATE ASSERTION checksid CHECK (select count (Sid) <=100) to check that the number of students must not
exceed 100.
One example of the dynamic integrity constraint is the trigger. Trigger is a statement that automatically fires if some
specific modifications occur on the database. E.g.
CREATE TRIGGER
AFTER/BEFORE insert OR update OF tuple on tablename BEGIN action END;
An index is an access path to efficiently locate row(s) via search key fields without having
to scan the entire table.
Primary index:
Secondary index:
index whose search key specifies the sequential An index whose structure is separated from the
order of file . Also called main index or integrated data file and whose search key typically specifies
index
an order different from the sequential order of
the file.
Clustered index
Types of Indexes:
Tree-based Indexes:B +-Tree
o Very flexible, only indexes to support point queries, range queries and prefix searches
Hash-based Indexes
o Fast for equality searches
Special Indexes
o Such as Bitmap Indexes for OLAP or R-Tree for spatial databases
Grammar
Elements + attributes
XML Schema
<xsd:simpleType name="Score">
<xsd:restriction base="xsd:integer">
<xsd:minInclusive value="0"/>
<xsd:maxInclusive value="100"/>
</xsd:restriction>
</xsd:simpleType>
Structure and Typing
Elements, attributes, simple and complex types,
groups
Supports includes relationships - inheritance
Specified as attribute of the document elements
such that all A1, , Am are already in the set of attributes result, but C is
8. Indexing
- records are stored in pages, each page contains a maximum number of
Add C to the set result.
records
3. Repeat step 2 until no more attributes can be added to result
- an index is a type of page
4. The set result is the correct value of X^+ (the closure of attributes)
- Types of indexes
- To find all candidate keys, look at each set of attributes K and calculate the
- sorted (uncommon, tree is better)
attribute closure K^+
- tree (like sorted but better, good for range, equality and prefix searches)
- If K+ contains all columns, K is a superkey
- multileveled, e.g. 2 levels mean records are 2 indexes away at most
- Check each subset of K to see if it is also a superkey
- hash (good for equality and thats it)
- Find the candidate keys (the smallest subset that is still a superkey)
- special (e.g. bitmap indexes, r-trees for spatial data)
- Pick one candidate key to be the primary key
- With an index, selecting takes less time, inserting takes more time
- A covering index (for a query) means all fields in the query are indexed, so
6. Normalisation ('decomposing' into normal forms)
the records are not accessed at all
- 1NF, all attributes are atomic (no multivalued or composite attributes)
- An "access path" is the journey you take to reach the data (e.g. query -->
- 2NF no partial dependencies (not important)
table scan --> record)
- 3NF no transitive dependencies (not important)
- A "search key" is a sequence of attributes that are indexed, includes primary
- BCNF no remaining anomalies from functional dependencies (good!)
key
- The only non-trivial FDs are key constraints
- Properties of indexes
- trivial FDs is X --> Y and Y is a subset of X (you determine yourself)
1. Main [or primary] (indexes contain the whole row) vs secondary (indexes
- formally: for every FD A --> B, either the FD is trivial or A is a superkey contain a pointer)
(primary key, candidate key, or more)
2. Unique (index over a candidate key) vs nonunique
- 4NF no multivalued dependencies (not important)
3. Clustered (data records are ordered the same way as indexes) vs
- 5NF no remaining anomalies (not important)
unclustered
- Decomposition attributes
- There can be at most one clustered index on a table
- Lossless-join decomposition
- Clustered is good for "range searches" (key is between two limits)
- When you join the decomposed relations, you get the original relation
4. Single- vs multi-attribute
- Not lossless-join doesn't usually mean whole rows are lost, it could
- CREATE TABLE usually creates a unique, clustered, main index on the
mean that meaningless rows are added
primary key
- If R(A, B, C) has A -> B, then the decomposition L(A, B) and L(A, C) is
- CREATE INDEX usually creates a secondary, unclustered index
always lossless-join
- CREATE INDEX name ON table (field)
- Dependency-preserving decomposition
- Space and time problems
- Every dependency from the original is still in the decomposed relations
- how much space per row? add up space per field (e.g. 20 byte record)
- Often, we say every original dependency is in exactly ONE of the
- how many records per block? divide the space of a block by this amount
decomposed relations
(calculate records per block, round down!) (e.g. 4K block)
- how many blocks? divide total # of records by # of blocks (e.g. 50 blocks)
7. Serialisability
- how long does the query take? times the number of blocks by the time an
- ACID
access takes (reading a disk block into memory)
- Atomicity (all or nothing)
- assumptions:
- Consistency (db always in valid state: triggers, cascading deletes, CHECKs,
- if a field has 3 possible values, there are an equal number of records
etc)
with each value
- Isolation (transactions do not interfere)
- 10% of the records with A = a also have B = b
- Durability (committing MEANS committed, once a commit returns, any
crashes can return to that commit)
9. OLAP
- a Transaction is a list of SQL statements that are ACID, one logical 'unit of
- OLAP stands for "online analytical processing"
work'
- Data warehousing
- they happen in order, together, if one fails they all fail
- db needs to be optimised for SELECT queries - UPDATE, DELETE etc can be
- 'Auto-commit' means every SQL statement is an entire transaction
slow
- Serialisability means interleaved execution is the same as batch execution:
- LOTS of tricks used: indexes, redundant fields, etc
given 2 transactions, the final state is the same regardless of the order
- maximise (to a point) redundancy
- dirty read (reading uncommitted data, WR conflict)
- Star schema
T1: R(A),W(A),
R(B),W(B),Abort
- 1 central fact table, n tables with FKs from the fact table
T2:
R(A),W(A),Commit
- for each dimension, we have a hierarchy
- getting totals and subtotals for the hierarchies:
- unrepeatable read (two reads in a transaction give different results, RW
- CUBE(x, y, z) does GROUP BY (nothing), GROUP BY (every combination)
conflict)
- ROLLUP(x, y, z) does GROUP BY (x, y, z), GROUP BY (y, z), GROUP BY (z),
T1: R(A),
R(A),W(A),Commit
GROUP BY (nothing)
T2:
R(A),W(A),Commit
- WINDOW queries
SELECT AGG() OVER name FROM ...
- lost update (overwriting uncommitted data, WW conflict)
WINDOW name AS (
T1: W(A),
W(B),Commit
[ PARTITION BY attributelist ] (attributes to select)
T2:
W(A),W(B),Commit
[ ORDER BY attributelist ] (attributes to order by)
- 2-phase-locking ensures serializable executions, but can mean some
[(RANGE|ROWS) BETWEEN v1 PRECEEDING AND v2 FOLLOWING] ) (rows
operations are blocked
to look at)
- Before reading, take shared lock
- Before writing, take exclusive lock
10. XML
- Hold lock until transaction commits/aborts
- Not summarised
not.