Chapter 4
Chapter 4
Chapter 4
Database Design
Agenda
• From Logical Model to Physical Database
• Database Performance Design
• Denormalization
• Views
• Data Definition Language
• Temporal Data Support
• Questions
Terminology Summary
http://datatechnologytoday.wordpress.com/2011/11/19/an-introduction-to-database-design-from-logical-to-physica
Transform Entities to Tables
• First general step:
– Map each entity in the logical data model to a
table in the database
• Things may, or may not, be that easy
– Denormalization?
PAYMENT
http://craigsmullins.com/dbta_072.htm
These data types allow database systems to store different kinds of information efficiently
http://craigsmullins.com/dbta_043.htm
Default
Column Ordering
• Sequence columns based on logging. For example:
– Infrequently updated non-variable columns first
– Static (infrequently updated) variable columns
– Frequently updated columns last
– Frequently modified together, place next to each other
AD
CUST FIRST LAST DR ACCT
ID NAME NAME ES BAL
S
http://craigsmullins.com/dbta_110.htm
Free Space
Nonleaf Nonleaf
Page Page Level 2
53 : 98 108 : 302
[103]
/ \
[101, 102] [104, 105]
In this B-tree:
• The root node contains 103, splitting the table into two parts.
The left subtree contains Employee IDs less than 103 (i.e., 101,
102).
• The right subtree contains Employee IDs greater than 103 (i.e.,
104, 105).
When you search for 104, the search starts at the root (103):
• Since 104 is greater than 103, the search moves to the right
subtree.
• In the right subtree, it finds the correct node (104), completing
the search efficiently.
The B-tree ensures that the search for 104 is done in logarithmic time,
reducing the number of comparisons needed as the tree grows.
Bitmap Index
Structure of a Bitmap Index
Consider a simple table called Employees with a Gender column:
For the Gender column, a bitmap index might look like this:
• Bitmap for 'Female':
• 1 0 1 0 1 (where 1 indicates that the corresponding row contains
'Female')
• Bitmap for 'Male':
• 0 1 0 1 0 (where 1 indicates that the corresponding row contains
'Male')
Employee_ID Name Gender
101 Alice Female
102 Bob Male
103 Carol Female
104 Dave Male
105 Eve Female
'Male' 1000011101
'Female' 0110000010
These strings indicate that rows 1, 6, 7, 8, and 10 are males; rows 2, 3, and 9
are females.
Other Types of Indexes
• Reverse Key Index
– is a type of B-tree index where the order of bytes of
each indexed column is reversed. This approach is
particularly useful for reducing contention on
frequently accessed keys.
• Partitioned Index
– a b-tree index specifying how to break up the index
(and perhaps the underlying table) into separate
chunks, or partitions; to enhance performance and
availability
• Ordered Index
Partitioning
Hashing
Keys
(e.g. LAST_NAME)
Hash
Algorithm
BLAKE Storage Locations
JACKSON
JOHNSON
JOHNSON JACKSON
MULLINS
BLAKE
MULLINS
NEELD Overflow
NEELD
def hash_function(user_id, table_size):
return user_id % table_size
Techniques:
• Interleaving can involve several
methods, such as:
• Row-Column Interleaving: Data
is distributed across rows and
columns, which can improve Database File
access patterns for specific
queries.
• Block Interleaving: Blocks of
data are distributed in a way that
balances load across storage
devices or nodes. Legend
Table 1
Table 2
Denormalization
• Prejoined Tables - when the cost of joining is prohibitive
• Report Tables - for specialized critical reports (e.g. CEO)
• Mirror Tables - when two types of environments require concurrent access to
the same data (OLTP vs DSS)
• Split Tables - when distinct groups/apps use different parts of the same table
– Splitting columns across two tables for long variable character columns.
• Combined Tables - to eliminate one-to-one relationships
• Redundant Data - to reduce the number of joins for a single column (e.g.
definitional, CA to California)
• Repeating Groups - to reduce overall I/O (& possibly DASD)
• Derivable Data - to eliminate calculations & aggregations
• Speed Tables - to support hierarchies
• Physical Implementation Needs – e.g.) to reduce page size
http://www.tdan.com/view-articles/4142
When to Denormalize
The only reason to denormalize, ever:
• To achieve optimal performance!
• If the database design achieve satisfactory performance
fully normalized, then there is no need to denormalize.
The Goal!
Views
TABLE 1
TABLE 2
VIEW 3
VIEW
Views
View Usage Rules
http://craigsmullins.com/dbta_115.htm
Types of SQL
Control
Definition
Manipulation
Temporal Data Support