0% found this document useful (0 votes)
6 views

Chapter 4

Uploaded by

abdoneuer67
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Chapter 4

Uploaded by

abdoneuer67
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Database Administration:

The Complete Guide to Practices and Procedures

Chapter 4
Database Design
Agenda
• From Logical Model to Physical Database
• Database Performance Design
• Denormalization
• Views
• Data Definition Language
• Temporal Data Support
• Questions
Terminology Summary

Common Graphic Design Rela- Object-


DP
Term Term Term tional Oriented
Term
Term Term
Type,
File Relation, ADT,
Table File Entity
Cabinet Table Class
File
Folder Row Record Occurrence Tuple, Instance,
Row Object
or Record
Field
Data Item Column
Fact Column Attribute Property
Data (Domain)
Element

Record Primary Primary Object


Index Identifier Key Key Key Identifier
Physical Database Design
Requirements
• In-depth knowledge of the database objects supported by the
DBMS and the physical structures and files required to support
those objects
• Details regarding the manner in which the DBMS supports indexing,
referential integrity, constraints, data types, and other features
that augment the functionality of database objects
• Detailed knowledge of new and obsolete features for particular
versions or releases of the DBMS
• Knowledge of the DBMS configuration parameters that are in
place
• Data definition language (DDL) skills to translate the physical
design into actual database objects
Basic Physical ROTs
• Avoid using default settings
– They are rarely the best setting
– It is better to know and explicitly state the actual setting
you desire in each case
• Synchronize the logical and physical models
– Always map changes in one to the other
• Performance before aesthetics
– Meaning: prefer fully normalized
but deviate when necessary to
achieve performance goals
• Almost never say or
Transforming Logical to Physical
• Translation of Logical Model to Physical Database
• Create DDL
• Entities to Tables, Attributes to Columns, Relationships and Keys to DB2
RI and Indexes, etc.
• … but differences CAN and WILL occur
• Create Storage Structures for Database
• Files for data and indexes
• Partitioning
• Clustering
• Placement
• Order of columns

http://datatechnologytoday.wordpress.com/2011/11/19/an-introduction-to-database-design-from-logical-to-physica
Transform Entities to Tables
• First general step:
– Map each entity in the logical data model to a
table in the database
• Things may, or may not, be that easy
– Denormalization?

PAYMENT

Payment Transaction Num


Type
Amount
PaymentDate
Status
Transform Attributes to Columns
• Attributes become columns
• Transform Domains to Data Types
– Commercial DBMSes do not support domains
– Date Type and Length
• Variable or Fixed Length
• Choose wisely; impacts data quality
– Constraints
– Null

http://craigsmullins.com/dbta_072.htm
These data types allow database systems to store different kinds of information efficiently

Data Type Description Size / Format Use Case


Fixed-length (e.g., Codes, abbreviations,
CHAR Fixed-length string
CHAR(10)) fixed-size data
Variable-length (e.g., Names, descriptions,
VARCHAR Variable-length string
VARCHAR(255)) varying-length text
Large text fields, documents
CLOB Large text data Several GBs
(XML, JSON)
Large double-byte Large multilingual text (e.g.,
DBCLOB Several GBs
character text Unicode)
Binary large object Storing images, videos, audio,
BLOB Several GBs
(multimedia) encrypted data
Fixed-length double-byte Non-Latin or special character
GRAPHIC Fixed-length
character string sets
Variable-length
Varying-length multilingual
VARGRAPHIC double-byte character Variable-length
text
string
DATE Date (no time) YYYY-MM-DD Birth dates, event dates
Appointment times, log
TIME Time (no date) HH:MM:SS
timestamps
YYYY-MM-DD Storing events with date and
DATETIME Date and time
HH:MM:SS time
Date, time, and fractional YYYY-MM-DD High-precision events,
Nulls

http://craigsmullins.com/dbta_043.htm
Default
Column Ordering
• Sequence columns based on logging. For example:
– Infrequently updated non-variable columns first
– Static (infrequently updated) variable columns
– Frequently updated columns last
– Frequently modified together, place next to each other

AD
CUST FIRST LAST DR ACCT
ID NAME NAME ES BAL
S

Static, Frequently updated at Frequently


infrequently the same time (marriage) updated
updated … but infrequently updated.
Determine Row Size
Relationships and Keys
• Use the primary key as assigned in the
logical modeling phase for the physical PK
• Other considerations:
– Length of the key
– Surrogate key
– ROWID / SEQUENCE / Identity
• Build referential constraints for all
relationships
– Foreign keys
Build Physical Structures
Feature Table Spaces DBSpaces Data Spaces Filegroups
Logical storage Logical areas of
Logical storage Logical groupings
structure for tables, memory for large
Definition areas for database of files in SQL
indexes, and other datasets
objects in Informix. Server.
database objects. (mainframes).
Oracle, DB2, Microsoft SQL
Database Systems IBM Informix IBM Db2 (z/OS)
PostgreSQL Server
Logical (mapping Logical memory Logical (mapping
Logical (mapping
Physical/Logical to physical storage areas for faster to physical data
to physical files).
chunks). access. files).
Manage physical
Organize and Optimize storage
storage of Efficient memory
optimize the management and
Purpose database objects management for
storage of performance by
and optimize disk large datasets.
database objects. grouping objects.
usage.
Separating tables,
indexes, and other Managing files
Managing physical High-performance
objects across across disks, often
Use Case storage in Informix memory access in
different files/disks for large tables or
environments. mainframes.
for better indexes.
performance.
Storage Planning
• Start by determining how many rows are
required
• Calculate the row size
• Figure out the number of rows per
block/page
• Multiple #rows/page by the page size
• This gives you the size of the object
• Except for free space…

http://craigsmullins.com/dbta_110.htm
Free Space

In summary, PCTFREE and FREEPAGE are


critical tools that a DBA can use to
optimize database performance by
managing how free space is allocated
within data pages. PCTFREE focuses on
reserving space for updates, while
FREEPAGE is used to manage space for
frequent inserts. A careful balance of these
two settings can help avoid performance
bottlenecks related to page splits, row
migrations, and excessive reorganizations.
Type of Files
• Data / Index
– Both require storage
• Raw Files
– Can be used to bypass the O/S
• Solid State Devices
– For performance-critical objects
– Storing databases having extreme performance
requirements on solid state devices instead of disk
can improve the performance of many database
operations
Database Performance Design
• Designing Indexes
– Partitioning
– Clustering
• Hashing
• Interleaving Data
Designing Indexes
B-Tree Index
Root Page Level 1
98 : 302

Nonleaf Nonleaf
Page Page Level 2
53 : 98 108 : 302

Nonleaf Nonleaf Nonleaf


Page Page Page Level 3
11 : 53 59 : 98 …

Leaf Page Leaf Page Leaf Page Leaf Page Level 4


… 11/Ptr … 53/Ptr … 59/Ptr … 98/Ptr

…to the data in the table.


B-Tree Index Example (Conceptual):
Imagine you have a table called Employees with the following rows:

Employee_ID Name Department Salary


101 Alice HR 70,000
102 Bob IT 80,000
103 Carol Marketing 60,000
104 Dave IT 90,000
105 Eve HR 75,000

Suppose you create an index on the Employee_ID column. The B-tree


index for this column might look something like this:

[103]
/ \
[101, 102] [104, 105]
In this B-tree:
• The root node contains 103, splitting the table into two parts.
The left subtree contains Employee IDs less than 103 (i.e., 101,
102).
• The right subtree contains Employee IDs greater than 103 (i.e.,
104, 105).
When you search for 104, the search starts at the root (103):
• Since 104 is greater than 103, the search moves to the right
subtree.
• In the right subtree, it finds the correct node (104), completing
the search efficiently.
The B-tree ensures that the search for 104 is done in logarithmic time,
reducing the number of comparisons needed as the tree grows.
Bitmap Index
Structure of a Bitmap Index
Consider a simple table called Employees with a Gender column:
For the Gender column, a bitmap index might look like this:
• Bitmap for 'Female':
• 1 0 1 0 1 (where 1 indicates that the corresponding row contains
'Female')
• Bitmap for 'Male':
• 0 1 0 1 0 (where 1 indicates that the corresponding row contains
'Male')
Employee_ID Name Gender
101 Alice Female
102 Bob Male
103 Carol Female
104 Dave Male
105 Eve Female

CREATE BITMAP INDEX idx_gender ON Employees (Gender);


SELECT * FROM Employees WHERE Gender = 'Female';
Identifier Gender Bitmap
1 Female 0110000010
2 Male 1000011101

A bitmap index is created on the Sex column of an EMPLOYEE table that


contains ten rows. The bitmap index will contain three strings as indicated
above, each with ten bits. The string is positional. Whatever position has a bit
turned on (“1”) the Sex column in that row will contain the value for which
that particular string was built. Examine the following three bitmaps:

'Male' 1000011101
'Female' 0110000010

These strings indicate that rows 1, 6, 7, 8, and 10 are males; rows 2, 3, and 9
are females.
Other Types of Indexes
• Reverse Key Index
– is a type of B-tree index where the order of bytes of
each indexed column is reversed. This approach is
particularly useful for reducing contention on
frequently accessed keys.
• Partitioned Index
– a b-tree index specifying how to break up the index
(and perhaps the underlying table) into separate
chunks, or partitions; to enhance performance and
availability
• Ordered Index
Partitioning
Hashing
Keys
(e.g. LAST_NAME)
Hash
Algorithm
BLAKE Storage Locations

JACKSON
JOHNSON

JOHNSON JACKSON
MULLINS
BLAKE
MULLINS

NEELD Overflow

NEELD
def hash_function(user_id, table_size):
return user_id % table_size

# Initialize the hash table


table_size = 5
hash_table = [[] for _ in range(table_size)] # Each slot starts as an empty list for chaining

# Sample user data


users = [
{"User_ID": 1, "Username": "Alice"},
{"User_ID": 2, "Username": "Bob"},
{"User_ID": 3, "Username": "Carol"},
{"User_ID": 6, "Username": "David"}, # This will cause a collision with User_ID 1
{"User_ID": 7, "Username": "Eve"} # This will cause a collision with User_ID 2
]

# Insert users into the hash table


for user in users:
user_id = user["User_ID"]
index = hash_function(user_id, table_size)
hash_table[index].append(user) # Using chaining to handle collisions
# Function to retrieve a user
def get_user(user_id):
index = hash_function(user_id, table_size)
for user in hash_table[index]:
if user["User_ID"] == user_id:
return user
return None # Return None if the user is not found

# Example of retrieving a user


retrieved_user = get_user(6) # Should retrieve David
print(retrieved_user) # Output: {'User_ID': 6, 'Username': 'David'}
Current State of the Hash Table
After inserting the users, the hash table will look like this:
Index Stored Users
0 []
1 [ {"User_ID": 1, "Username": "Alice"} ] • Index 1 contains Alice.
2 [ {"User_ID": 2, "Username": "Bob"} ] • Index 2 contains Bob.
3 [ {"User_ID": 3, "Username": "Carol"} ] • Index 3 contains Carol.
4 [ {"User_ID": 6, "Username": "David"}, • Index 4 contains both David
{"User_ID": 7, "Username": "Eve"} ] and Eve due to collisions (as
both hash to the same index).
Clustering
CREATE TABLE Sales (
Sale_ID INT PRIMARY KEY,
Sale_Date DATE,
Amount DECIMAL(10, 2),
Customer_ID INT
);
Creating a Clustered Index:Now, we’ll create a clustered index on the Sale_Date column.
This will organize the rows of the Sales table physically based on the sale date.
CREATE CLUSTERED INDEX idx_sale_date ON Sales (Sale_Date);
INSERT INTO Sales (Sale_ID, Sale_Date,
Amount, Customer_ID)
VALUES
(1, '2024-10-01', 100.00, 101),
(2, '2024-10-03', 150.00, 102),
(3, '2024-10-02', 200.00, 101);
After executing the above, the data is stored in
the following order:
SELECT * FROM Sales WHERE Sale_ID Sale_Date Amount Customer_ID
Sale_Date BETWEEN '2024-10-01' 1 2024-10-01 100.00 101
AND '2024-10-02'; 3 2024-10-02 200.00 101
2 2024-10-03 150.00 102
Interleaving Data Disk Drive

Techniques:
• Interleaving can involve several
methods, such as:
• Row-Column Interleaving: Data
is distributed across rows and
columns, which can improve Database File
access patterns for specific
queries.
• Block Interleaving: Blocks of
data are distributed in a way that
balances load across storage
devices or nodes. Legend

Table 1
Table 2
Denormalization
• Prejoined Tables - when the cost of joining is prohibitive
• Report Tables - for specialized critical reports (e.g. CEO)
• Mirror Tables - when two types of environments require concurrent access to
the same data (OLTP vs DSS)
• Split Tables - when distinct groups/apps use different parts of the same table
– Splitting columns across two tables for long variable character columns.
• Combined Tables - to eliminate one-to-one relationships
• Redundant Data - to reduce the number of joins for a single column (e.g.
definitional, CA to California)
• Repeating Groups - to reduce overall I/O (& possibly DASD)
• Derivable Data - to eliminate calculations & aggregations
• Speed Tables - to support hierarchies
• Physical Implementation Needs – e.g.) to reduce page size

http://www.tdan.com/view-articles/4142
When to Denormalize
The only reason to denormalize, ever:
• To achieve optimal performance!
• If the database design achieve satisfactory performance
fully normalized, then there is no need to denormalize.

You should always consider the following issues before


denormalizing.
• Can the system achieve acceptable performance
denormalizing?
• Will the performance of the system denormalizing
still be unacceptable?
• Will the system be less reliable due to denormalization?
Denormalization Administration
The decision to denormalize should never be made
lightly, because it can cause integrity problems and
involve a lot of administration.

Additional administration tasks include:


• Documenting every denormalization decision
• Ensuring that all data remains valid and accurate
• Scheduling data migration and propagation jobs
• Keeping end users informed about the state of the
tables
• Analyzing the database periodically to decide whether
denormalization is still required
Normalized vs. Denormalized

The Goal!
Views

TABLE 1

TABLE 2

VIEW 3

VIEW
Views
View Usage Rules

• Security - row and column level


• Access - efficient access paths
• Data Derivation - put the calculations in the view
• Mask Complexity - hide complex SQL from users
• Rename a Table
• Column Renaming - table with better column
names (easier to use than AS)
• Synchronize all views with base tables...

DO NOT USE ONE VIEW PER BASE TABLE!

http://craigsmullins.com/dbta_115.htm
Types of SQL
Control

Definition

Manipulation
Temporal Data Support

• Many types of data change over time, and different users


and applications have requirements to access the data at
different points in time.
– Instead of creating separate history tables, using triggers, and/or
implementing snapshot tables, a DBMS with temporal features can
manage the time aspect of data.
• There are two types of temporal data supported:
– Business Time
– System Time
Temporal Data:
Business Time vs. System Time
• Business Time (aka application time or valid time)
– Specifies when the facts stored in the database
are true with respect to the real world.
– These are the dates of interest to the business
user interacting with the data.
– Business time is useful for only certain types of data
that change over time and the validity of the data is
relevant to the application and users.
• System Time (aka transaction time)
– Denotes the time when the fact became current in the database.
– System time can be used to track the insertion and modification history
of the data.
– Unlike business time, transaction time may be associated with any
database entity.
Feature Business Time System Time
Reflects when data is
Reflects validity of data in a
Definition stored or changed in the
business context
system
User-centric (business System-centric
Perspective
operations) (database operations)
Lifecycle and
Focus Validity period of business facts transaction history of
records
Pricing effective from January Record created on
Example
2024 2024-01-15 10:00:00
Data management,
Use Cases Historical reporting, auditing
system auditing
A DBMS Can Support Both Business
Time and System Time
• Both are implemented via a time period specification
• Business Time is tracked in a single table.
– Beginning and Ending time periods indicate which rows
apply to which time period
• System Time is tracked using two tables.
– One table contains the current data.
– Another, history table, contains the non-current data.
– Still requires Beginning and Ending times to indicate
which rows apply to which time period
• A single “logical” table can be setup
for both business and system time
A Temporal Example
• Why would you need temporal data management?
– Consider an INSURANCE company example
• The terms of any specific insurance policy are valid over a period of time.
• After that period of time, customers can choose to decline further coverage, continue
with the existing coverage, or modify the terms of their coverage.
• So at any specific point in time, the terms of the customers’ policy can differ.
– Over time, customers make claims against their policies. This claim
information needs to be stored, managed, and analyzed.
• Accident histories for customers are also important pieces of data with a temporal
element.
– Consider the complexity of trying to develop not only a database design
that accommodates changing policies, claims, and historical details, but
also enables queries such that a user might access a customer’s
coverage at a given point in time.
• Example: what policies were in effect for that customer as of, say, April 15, 2012? Or
any other date during which the customer had coverage?
Questions

You might also like