0% found this document useful (0 votes)
8 views

GROUP 5 Physical Database Design and Performance

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

GROUP 5 Physical Database Design and Performance

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 79

Physical Database Design

and Performance
Database design basics
• Determining data to be stored
• Logically structuring data
• ER diagram
• A design process suggestion for Microsoft Access

• Normalization
• Physical design
What is database?
A database is an organized collection of structured information, or
data, typically stored electronically in a computer system. A
database is usually controlled by a database
management system (DBMS). Together, the data and the DBMS,
along with the applications that are associated with them, are
referred to as a database system, often shortened to just database.
Data base design basic
A properly designed database provides you
with access to up-to-date, accurate
information. Because a correct design is
essential to achieving your goals in working
with a database, investing the time required
to learn the principles of good design makes
sense. In the end, you are much more likely to
end up with a database that meets your
needs and can easily accommodate change.
Determining
data to be
stored
In a majority of cases, a person who is doing the
design of a database is a person with expertise in the
area of database design, rather than expertise in the
domain from which the data to be stored is
drawn. Therefore, the data to be stored in the database
must be determined in cooperation with a person who
does have expertise in that domain, and who is aware
of what data must be stored within the system.
Determining data relationships

Once a database designer is aware of the data which is to be stored within


the database, they must then determine where dependency is within the data

Sometimes when data is changed you can be changing other data that is not
visible..
A good database design is, therefore, one that:

01 Divides your information


into subject-based tables to
reduce redundant data.

02 Provides Access with the information it


requires to join the information in the
tables together as needed.

03 Helps support and ensure the


accuracy and integrity of your
information.

04 Accommodates your data


processing and reporting needs.
The design process

Find and
Determine the Divide the
organize the
purpose of your information into
information
data tables
required

Turn Specify primary


information keys
items into
columns

Set up the table Apply the


Refine your
relationships normalization
design
rules
Determining the purpose of your slide
It is a good idea to write down the purpose of the database on paper — its
purpose, how you expect to use it, and who will use it.

for example:
“The customer database keeps a list of customer information for the
purpose of producing mailings and reports”

If the database is more complex or is used by many people, as often occurs


in a corporate setting, the purpose could easily be a paragraph or more and
should include when and how each person will use the database.
Logically structuring data

Once the relationships and dependencies amongst the


various pieces of information have been determined,
it is possible to arrange the data into a logical
structure which can then be mapped into the storage
objects supported by the database management
system.
The way this mapping is generally performed is such
that each set of related data which depends upon a
single object, whether real or abstract, is placed in a
table. Relationships between these dependent objects
is then stored as links between the various objects
ER diagram
(entity-relationship model)

Database designs also include ER


(entity-relationship model) diagrams.
An ER diagram is a diagram that helps
to design databases in an efficient way.
A design process suggestion for
Microsoft Access
1.Determine the purpose of the database
2.Find and organize the information required
3.Divide the information into tables
4.Turn information items into columns
5.Specify primary keys
6.Set up the table relationships
7.Refine the design
8.Apply the normalization rules
Normalization

a systematic way of ensuring that a database


structure is suitable for general-purpose querying
and free of certain undesirable
characteristics,insertion, update, and deletion
anomalies that could lead to loss of data integrity.
Physical design
The physical design of the database specifies the
physical configuration of the database on the storage
media.

• Security
• Replication
• High-availability
• Partitioning
• Backup and restore schemes.
DENORMALIZATION
AND PARTITIONING
DATA
NORMALIZATION
VS
DENORMALIZATION
NORMALIZATION VS DENORMALIZATION

• Normalization is used to remove redundant data from the


database and to store non-redundant and consistent data into
it.

• Denormalization is used to combine multiple data into one so


that it can be queried quickly.
WHAT IS DENORMALIZATION?
DENORMALIZATION

• Opposite of normalization
• Joining tables and allow the data to be repeated
• It is the process of adding redundant columns to the
database in order to improve performance
PURPOSE OF DENORMALIZATION
PURPOSE OF DENORMALIZATION
• To improve the performance of database infrastructure:
By either:
• Reducing the number of tables
• Reducing the number of joins required during query
execution
• Reducing the number of rows to be retrieved from primary data
table
WHAT IS DATA PARTITIONING?
DATA PARTITIONING

• A partition is a section of a storage device, such as hard disk drive or


Solid state drive.

• Data Partitioning is the technique of distributing data across multiple


tables, disks, or sites in order to improve query processing performance or
increase database manageability.

• This partition is used to repair problems that prevent the operating


system from booting.
PARTITIONING STRATEGIES

1 HORIZONTAL PARTITIONING

2 VERTICAL PARTITIONING
HORIZONTAL PARTIONING (often called sharding)

• is a database design principle whereby rows of a database table are held


separately

VERTICAL PARTITIONING
• divides the table vertically (by columns), which means that the
structure of the main table changes in the new ones .
ADVANTAGES AND
DISADVANTAGES
ADVANTAGES OF PARTITIONING

• Organized
• Efficiency: Record used together are grouped
together.
• Each partition can be optimized for performance.
• Security and Recovery
DISADVANTAGES OF PARTITIONING

• Inconsistent access speed: Slow retrievals across


partitions
• Extra space or update time: duplicate data access
from the multiple partitions
WHY PARTITION DATA?
IMPROVE PERFORMACE
• Data access operations on each partition take place over a smaller volume of
data. Correctly done, partitioning can make your system more efficient. Operations
that affect more than one partition can run in parallel.

IMPROVE SECURITY
• In some cases, you can separate sensitive and nonsensitive data into different partitions
and apply different security controls to the sensitive data.

IMPROVE AVAILABILTY
• Separating data across multiple servers avoids a single point of failure. If one
instance fails, only the data in that partition is unavailable. Operations on other
partitions can continue.
DESIGNING PHYSICAL
DATABASE FILE
What is physical design in database
 Process of producing a description of the implementation of the
database
 on secondary storage; it describes the base relations, file organizations,
and indexes used to achieve efficient
 access to the data, and any associated integrity constraints and security
measures.
 The conceptual design and logical were independent of physical
considerations. Now, we not only know that we want a relation model, we
have selected a database management system (DBMS) Oracle, and we
focus on those physical conderations.
 Logical Database Design is concerned with what
to store.
o Physical Database Design is concerned with how to store.
 Underlying Concepts
 Because physical design is related to how data are physically stored, we need
to consider a few underlying concepts about physical storage. One goal of
physical design is optimal performance and storage space utilization.
Physical design includes data structures and file organization, keeping in
mind that the database software will communicate with your computer’s
operating system. Typical concerns include: • Storage allocations for data
and indexes
 Record descriptions and stored sizes of the actual data
 Record placement
 Data compression, encryption
PHYSICAL DESIGN IN DBMS (Naming)
 Note that I specified the table design above is applicable
to MS ACCESS. Access is very “forgiving” when it comes to
things like names. You can have spaces, you can pretty
much do what you want.
 Other DBMS, such as Oracle, are more strict. Most DBMS,
in fact, have naming restrictions similar to those in
Oracle. Attribute names must begin with a letter, you
cannot have spaces (although you CAN have underscore).
You can’t use “reserved words” such as the name of
functions (max) or datatypes (date).
PHYSICAL DESIGN IN DBMS SPECIFIC (data types)

Although there is a reasonably small set of


primitive data types (numbers, letters,
images, etc), the way that different DBMS
deal with those data types, the names they
use for the data types, and constraints
regarding the data types vary.
ALPHANUMERIC DATA
 This is a generic data type that includes letters, numbers, symbols
the sort of “characters” we are used to working with in a word
processor.
 Oracle calls this type of data CHAR for fixed length fields (always
stores the full length allowed anything not filled in with “text” will
be filled in with spaces); it uses VARCHAR2 for variable length fields
(stores the number of characters typed). There is a VARCHAR data
type for variable length fields as well, but that is likely to be
deprecated in future releases, so VARCHAR2 is preferred over
VARCHAR.MS Access calls this type of field TEXT.
Numeric Data (Number)
 Oracle uses number (p,s) format for numbers; where p is
the precision (how many digits total) and s is the scale
(how many of the p digits are after the decimal point). So
a number like 9.25 would be represented as number (3,2)
there are 3 digits, 2 of which are after the decimal.
 Oracle allow you to specify numbers by type, such as
integer, long, etc but that practice is discouraged for two
reasons: (1) those typed numbers take the maximum
allowable space for the type usually more space than you
need, and (2) the functionality may go away in future
versions.
Constrains: Limits On The Data

Keeping in mind enterprise constraints


(business rules), determine:
 required data – i.e. NOT NULL
 relational integrity constraints (referential
integrity)
 domain for the data.
 Not Null: The field is never allowed to be empty. Data must be entered at the time the row is created.
This is not the same as data that are needed eventually--requiring that data might be a policy
measure. Keep in mind that if you specify “not null” as a constraint, you cannot save the row of data if
that column is blank.
 Unique: No duplicate data when you look across rows. A unique constraint can be used for a candidate
(alternate) keys (Access calls this the “no duplicates” property).
 Primary Key: Adding the primary key constraint automatically includes not null constraint + unique
constraint (+ an index).
 Foreign Key: using the foreign key constraint enforces referential integrity. Every foreign key must
match a primary key or a unique constraint on another table. The sequence of data entry is affected
here - you must enter data into the “main” table before it can be used as a foreign key. (Access uses
properties within the “relationships” window to enforce referential integrity. Oracle uses “references”
statement within the table constraints).
 Check; A check constraint helps to enforce attribute domains.
 Boolean logic (“Check SBP >=0 and SBP <=350” or “Check SBP Between 0 and 350”). This would be
evaluated as true or false; if false you get an error message and can't store the row of data.
 The check constraint is called a validation rule in Access.
 The above rule would also act like a NOT NULL statement. Why? Because null data are not between 0
and 350. How would you correct this to allow null values? By adding "...or is null"
 Check; A check constraint helps to enforce attribute domains.
 Boolean logic (“Check SBP >=0 and SBP <=350” or “Check SBP Between 0 and 350”).
This would be evaluated as true or false; if false you get an error message and can't store
the row of data.
 The check constraint is called a validation rule in Access.
 The above rule would also act like a NOT NULL statement. Why? Because null data are not
between 0 and 350. How would you correct this to allow null values? By adding "...or is
null"

 Oracle example – when you create the table, you could list the check constraint in the
create table statement: Create table vitals (…[list of attributes and data types],
 SBP Number (3) check SBP between 0 and 350, …
 In Access you would create the SBP column in table design view, then in the properties for
this column, create the validation rule:
 SomeDBMS provide more facilities than others for defining
enterprise constraints. An example of a complex constraint that
might work in Oracle, but that would not work in Access:

CONSTRAINT StaffNotHandlingTooMuch
CHECK (NOT EXISTS (SELECT staffNo
FROM PropertyForRent
GROUP BY staffNo
HAVING COUNT(*) > 100))

 Triggers: These are stored procedures (code) that “fire” automatically


when data are manipulated-when there is an insert, update, or delete
statement. Triggers can call (use) other code modules. Constraints are
faster than triggers but triggers can be more complex.
In Oracle, triggers are created within PL*SQL (the internal Oracle
programming language).
Access does not allow triggers on tables. If you use forms, you can
emulate the action of triggers by writing code inside the form.
MORE ABOUT DOMAIN CONSTRAIN
 Domain constraints are limits on the data values. What business
rules can we implement in some physical manner? For example,
suppose our business rule says that a valid SBP must be between 0
and 350. We can do this with a check constraint (Oracle) or
validation rule (Access) as discussed above. You might enforce
business rules in the user interface instead of at the table level:
e.g., use a picklist of values and restrict users to only choosing
something from the picklist. This would work nicely when the list of
choices is known. For very small lists, you might consider coding the
data numerically, and using radio buttons, like:
Gender: ◦MALE ◦FEMALE

where Male = 1 and Female = 2

 *radio buttons can only store NUMERIC data. Each choice is given a
number code, and the user is allowed to choose only ONE of the choices
– like the channel selection buttons on a car radio.
Derived data

 Examine the user requirements, ERD, logical data model


and data dictionary, and produce a list of derived
attributes. Derived attribute can be stored in database or
calculated every time they are needed. Document what is
to be done! Your choice may be based on: space to store
the derived data, effort to keep it consistent with the
data from which it is derived; versus cost (query time) to
calculate the value each time it is required. Less
expensive option might be chosen subject to performance
constraints.
In a main data table, you would store the CODE. Then
you can look up the value.
The SNDB listGender table is another example.
When we build the interface for out DB, we can use
code tables or “look up” tables (single column list of
choices) to populate list boxes and combo boxes. This
can simplify data entry for your users – those
interface boxes can be set to display the meaning to
the user, but store the code in the table.
ELEMENTS OF PHYSICAL
DATABASE FILE DESIGN
ELEMENTS OF PHYSICAL DATABASE FILE
DESIGN
 Physical file design includes those aspects of database design that are usually
not visible to the users and applications. The objective of physical file design
is to optimize performance.
 Physical data independence should not be given up unless significant
improvements in machine performance can be recognized.
 Software dependence on physical data structures adds complexity to software
design and makes maintenance more difficult.
 Database systems allow software design decisions to be made independent of
the physical data organization and storage.
ELEMENTS OF PHYSICAL DATABASE FILE
DESIGN

Physical file design considers


the physical layout of data,
addressing techniques,
compaction techniques, and
storage devices.
PHYSICAL LAYOUT

Physical layout of the data is concerned with:


 the organization of files and data elements,
 partitioning of data,
 clustering of data,
 file blocking factors to maximize performance
and/or reduce space requirements.
ADDRESSING TECHNIQUES

Addressing techniques for flat files and relational


database tables include:
 sequential,
 indices,
 inverted lists,
 hashed.
ADDRESSING TECHNIQUES

For network and hierarchical databases, addressing


techniques include such things as:
 contiguous lists,
 Indices,
 Pointer methods,
 Bitmaps.
COMPACTION TECHNIQUES

Compaction techniques include:


 the use of packed fields to reduce space requirements and
improve response time,
 compression of images according to various standards, for
example, Joint Photographic Experts Group (JPEG), Moving
Pictures Experts Group (MPEG), Digital Video Interactive
(DVI),
 compression of text and executables.
COMPACTION TECHNIQUES

Compressed data can reduce space requirements and help


performance by:
 increasing the amount of data that can be held in main memory at
one time,
 reducing the volume of data to be transmitted over communication
lines.
However, there is some overhead associated with
compressing/decompressing data. The tradeoffs must be evaluated.
STORAGE DEVICES

Storage devices are selected to define the storage hierarchy


such as:
 Main memory,
 Solid state disk (SSD),
 Magnetic disk,
 Optical disk,
 Tape.
The storage device used should match the response time
requirement. Multiple devices may be used in a buffering or
caching scheme to improve performance.
Using and Selecting
Indexes
Indexing in Databases

• Indexing is a way to optimize the


performance of a database by minimizing
the number of disk accesses required when
a query is processed. It is a data structure
technique which is used to quickly locate
and access the data in a database.
Indexes are created using a few database
columns.
• The first column is the Search key that contains a copy of the primary
key or candidate key of the table. These values are stored in sorted
order so that the corresponding data can be accessed quickly.
• The second column is the Data Reference or Pointer which contains a
set of pointers holding the address of the disk block where that
particular key value can be found.
The indexing has various attributes:
• Access Types: This refers to the type of access such as value based
search, range access, etc.
• Access Time: It refers to the time needed to find particular data
element or set of elements.
• Insertion Time: It refers to the time taken to find the appropriate
space and insert a new data.
• Deletion Time: Time taken to find an item and delete it as well as
update the index structure.
• Space Overhead: It refers to the additional space required by the
index.
In general, there are two types of file organization mechanism
which are followed by the indexing methods to store the data:

1. Sequential File Organization or Ordered Index File: In this, the


indices are based on a sorted ordering of the values. These are
generally fast and a more traditional type of storing mechanism. These
Ordered or Sequential file organization might store the data in a dense
or sparse format.
Dense Index:
• For every search key value in the data file, there is an index record.
• This record contains the search key and also a reference to the first
data record with that search key value.
Sparse Index
• The index record appears only for a few items in the data file. Each item
points to a block as shown.
• To locate a record, we find the index record with the largest search key
value less than or equal to the search key value we are looking for.
• We start at that record pointed to by the index record, and proceed along
with the pointers in the file (that is, sequentially) until we find the desired
record.
2. Hash File organization: Indices are based on the values being
distributed uniformly across a range of buckets. The buckets to
which a value is assigned is determined by a function called a hash
function.
There are primarily three methods of
indexing.
• Clustered Indexing

• Non-Clustered or Secondary Indexing

• Multilevel Indexing
Clustered Indexing
• Clustering index is defined on an ordered data file. The data file is
ordered on a non-key field. In some cases, the index is created on
non-primary key columns which may not be unique for each record. In
such cases, in order to identify the records faster, we will group two
or more columns together to get the unique values and create index
out of them. This method is known as the clustering index. Basically,
records with similar characteristics are grouped together and indexes
are created for these groups.
Non-clustered or Secondary Indexing
• A non clustered index just tells us where the
data lies, i.e. it gives us a list of virtual pointers
or references to the location where the data is
actually stored.
Multilevel Indexing
• The multilevel indexing segregates the main block into various smaller
blocks so that the same can stored in a single block. The outer blocks
are divided into inner blocks which in turn are pointed to the data
blocks. This can be easily stored in the main memory with fewer
overheads.
Designing Database for
Optimal Query Performance
•Parallel query processing- Possible when
working in multiprocessor system.
•Overriding automatic query optimization-
allows for query writers to preempt the
automated optimization.

You might also like