A Practical Guide To Database Design Second Edition PDF
A Practical Guide To Database Design Second Edition PDF
Database Design
A Practical Guide to
Database Design
Second Edition
Rex Hogan
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made
to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all
materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all
material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been
obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future
reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized
in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying,
microfilming, and recording, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.
copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-
8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that
have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Introduction, xiii
Author, xv
vii
viii ◾ Contents
INDEX, 409
Introduction
xiii
xiv ◾ Introduction
• Finally, Chapter 11 describes how to use PHP to build a web-based interface to review
and update data in a database.
The database used in this example is the tracking database created in Chapter 7 and
loaded with the PERL script in Chapter 8. Appendices A and B contain the PHP scripts
used by this interface to update the status flags in that database.
• Appendix C contains the Data Definition Language (DDL) text file generated by the
data modeling tool to create the University database.
Database Administrators
• In addition to the above-mentioned uses, learn how to use an industry-leading data
modeling tool by reviewing the material and exercises in Chapter 5. It includes
instructions on how to create the DDL statements needed to create a physical data-
base. The DDL to create the University database is included as Appendix C.
• Learn how to implement a physical database using either Microsoft Access or SQL
Server by following the exercises in Chapters 6 and 7.
Developers
• Learn how to translate user requirements into a database solution by reviewing the
material and exercises in Chapters 2 through 4.
• Chapter 8 shows how to use the PERL language to identify records containing data of
interest from external files and load them into a table in a database.
• Learn how to use Microsoft Access to develop user interfaces by reviewing the exer-
cises in Chapters 6 and 10.
• Gain an understanding of software used to develop user interfaces by reading
Chapter 9.
• Learn how to use PHP to develop a web-based interface to a database by reviewing
the information and code contained in Chapter 11 and listed in Appendices A and B.
Author
Rex Hogan is the author of A Practical Guide to Database Design (first edition). In addi-
tion, he has written Diagnostic Techniques for IMS Data Bases and coauthored Managing
IMS DataBases with Steve Shapiro and Maxie Zinsmeister.
Rex has more than 40 years of experience as a database administrator and a software
engineer. This includes more than 17 years of experience with Southwestern Bell/AT&T
where he became their lead database specialist. During this period, he also taught various
undergraduate and graduate classes at Washington University’s Center for the Study of
Data Processing in St. Louis, Missouri. He then worked for TRW/Northrop Grumman in
Fair Lakus, VA for 16 years, primarily as a database administrator (DBA)/senior software
engineer in the Intelligence Community where he became a specialist in the rapid design
and development of database systems and applications. Finally, he worked for five years as
a computer scientist for the Air Force Office of Special Investigations, where he developed
computer systems to monitor usage of Air Force Internet traffic.
xv
Chapter 1
Overview of Databases
• Home-based computers are frequently used for managing a personal business, update
spreadsheets, or complete school assignments. Others use them for email, social
interaction with friends and family members, monitoring the Internet for news, or
for entertainment.
• Owners of small businesses use spreadsheets and/or software products such as
QuickBooks to keep track of personal or business expenses.
• Office environments must gather and store and manage information for a wide range
of topics or subjects, such as customers or clients, appointments, or customer orders.
• Business environments must manage a much wider scope of data regarding the infor-
mation and data needed to run or manage the business.
• Users using computers in government offices need computers to manage their
jobs. For those working as analysts in the Department of Defense (DOD) or in the
Intelligence Community, the nature of the job is continually expanding, requir-
ing analysts to monitor or track new information or data as it becomes available.
Analytical teams continually face the responsibility of analyzing new and evolving
forms of information to identify and extract information of relevance using software
tools available to them. Often, that means having not much more than the desktop
Microsoft Office products ranging from Excel to Microsoft Access.
As the data needed by the user or customer community grow in size, complexity, and
importance, the care and feeding of that data requires the use of a database management
system (DBMS) to store, manage, and protect it.
1
2 ◾ A Practical Guide to Database Design
A DBMS1 is a special software package that is designed to define and manage data
within one or more databases. Individual databases, in turn, manage the definition of data
objects/tables in a given subject area and provide controlled user access to that data.
Examples of DBMSs include Structured Query Language (SQL) Server, Oracle, and
Microsoft Access. An SQL Server or Oracle instance would then serve as host to, for exam-
ple, a personnel database.
• After logging in and starting the transfer, the software performing the updates first
issues a database update to debit the savings account for the specified amount.
• If that update is successful, it issues an update to credit the checking account by that
amount.
• Upon successful completion, a commit call is issued to commit the changes and
release database locks on the rows being updated. An appropriate message would be
sent to the user confirming that the funds transfer was completed.
• If, however, the update to the checking account failed (e.g., the user entered the wrong
savings account number), a rollback call would be made to reverse all updates made,
and an appropriate error message would be sent to the user. As a result, the database
and the underlying data are left in a clean, consistent state.
The ACID4 properties (atomicity, consistency, isolation, and durability) of database systems
and transactions guarantee the accuracy and availability of data.
1.2.1 Atomicity
The atomicity is the all or nothing requirement when making updates. Either all updates made
during the unit or work succeed or no updates are made. This protection includes updates in
a unit of work or transaction, device input/output errors, network errors, and power failures.
Overview of Databases ◾ 3
1.2.2 Consistency
Consistency requires that transactions take the database from one valid state to another.
Any and all updates must conform and enforce any referential integrity5 constraints
defined. (Referential integrity constraints define and control any one-to-many relation-
ships between tables in a database.)
1.2.3 Isolation
Isolation of database updates involves mechanisms that enable multiple concurrent users
to simultaneously access and update the same data elements within a database.
As database updates occur, locks are transparently placed on updated rows that prevent
subsequent users to access or update those rows until the updating process commits those
updates and the locks are released. Any processes requesting access to rows being updated
are held/delayed until the updater’s commit point is made.
1.2.4 Durability
This feature/requirement ensures that any updates made by a transaction (i.e., a unit of
work completed and updates committed) will survive a subsequent system error or prob-
lem, for example, a system failure or a power or disk failure.
Database systems have mechanisms/features that support a full database backup. In
addition, database systems log updates to nonvolatile devices (a database log file) as updates
are made to the database. If/When necessary, a database can be rebuilt/recovered totally by
first using the database backup to recover all data to the point the backup was made, then
using the database log to reapply all updates made to the database after that point in time.
This subject is covered in more detail in Section 1.6.
For example, each table in MySQL is implemented as a flat file with indexes as
needed to support data retrieval. If/When any changes are required, for example,
a column is to be added, MySQL creates a new temporary table with the new col-
umn, copies all records from the original file to the new, and then deletes and
renames the old and new files accordingly.
4 ◾ A Practical Guide to Database Design
As a work around, I then made what I hoped was a one-time modification to the
table adding spare columns (Spare1, Spare2, Spare3, etc.) with the plan of renaming
these columns if/when needed to reflect application-specific, meaningful names.
That helped, but even then I found that MySQL required/used too much overhead
for managing large tables.
The ability to dynamically change table definitions can, in most products, be made using
that product’s database administrator (DBA) graphical user interface, or by working at the
command line by issuing commands using the product’s data definition language (DDL).
The DBA user interface is much easier and quicker to use, but when supporting mission-
critical applications, change management procedures are used to control updates across
multiple environments and platforms, each with their own copy and version of the applica-
tion database.
• A Development platform is used to design, develop, and test individual software com-
ponents and tables within a database.
• Incremental changes are made by manually running DDL changes at the command
prompt.
• All incremental changes are accumulated as they are applied, creating a change pack-
age with all modifications needed for that release.
• When all changes have been made and tested for a software release, a Test platform
is used.
• With the test system database configured for the older software release, the change
package is applied and the software release is tested to ensure all updates have been
correctly applied and the software works as intended.
• If errors are found, the change package must be revised as necessary and the entire
update process repeated.
• After changes have been successfully applied and tested on the Test system, the
change package is ready to be applied to the Production platform.
The careful application and use of change packages on the Test platform allows the sched-
uling of downtime of the Production system with the expectation of no surprises when the
updates are applied.
Overview of Databases ◾ 5
For example,
would retrieve the first and last names from the Employee table where that table’s DEPT-
NUMBER column has a value of 12.
SQL has, of course, many variations to extend the power and flexibility of the com-
mands issued. Some simpler examples are
would display the highest annual salary found in the Employee table.
The power of SQL is multiplied by creating a “View”6 of two or more tables that create a
virtual object to query against. For example, a View can be created named “Department_
Employees” that combines the Department table with the Employee table matching the
DepartmentID column in Department with the DepartmentID in Employee.
Using this virtual table,
will list all information for employees that are assigned to Department 12.
Developers and users, however, must be aware that SQL implementations are not all
equal.7 RDBMS vendors have worked together over the years to define and implement
SQL within their products beginning in 1986. However, at the implementation level, SQL
6 ◾ A Practical Guide to Database Design
commands will vary depending on the RDBMS product being used. The major vendors
support the basic SQL standards (i.e., their product will provide some specific function-
ality), but the details of implementation will be different. For example, SQL provides for
a wildcard character to allow/support either single character matches or to match any
number of characters in a search string. If using SQL Server or Oracle:
will display all information from the products table for product names begin with “DELL.”
If using Microsoft Access, this command would be
Department
DepartmentID
DepartmentName
DepartmentManagerID
Employee
EmployeeID
EmployeeFirstName
EmployeeMiddleName
EmployeeLastName
EmployeeWorkPhoneNumber
EmployeeHomePhoneNumber
Overview of Databases ◾ 7
EmployeeStreetAddress
EmployeeCity
EmployeeState
EmployeeZipCode
DepartmentID
Employee-Deduction
EmployeeID
DeductionCode
DeductionAmount
When the tables are loaded, information for each occurrence is loaded as a row in the
respective table and each data element/value is associated with its respective column in the
table. For example, a row in the department table might have the following content:
Looking at the employee table, the first row for an employee assigned to the above-
mentioned department might appear as follows:
Note that the relationship between the Department row for the Northern Virginia
Marketing unit and the Employee row for Albert Smith is determined by the fact that the
DepartmentID, “SalesArea1,” is stored as the value for the DepartmentID column in the
Employee table. This illustrates the one-to-many relationship between Department and
Employee and is referred to as Referential Integrity.
Referential Integrity is fundamental to data integrity and is normally activated and
enforced by the RDBMS as data are inserted or deleted from individual tables. For example
1.6 BACKUP/RECOVERY
All DBMSs have, inherently, built-in backup/recovery systems. If the database fails
(e.g., the underlying disk drive(s) fail), the database can be restored to the point of failure;
that is, restored to contain data up to the last unit of work completed before the failure
occurred. Depending on the DBMS product being used, backup/recovery services include
some combination of the following:
• A mechanism/service to create a complete cold backup (i.e., the data are offline and
not being updated).
• A mechanism/service to create a hot backup (i.e., the database is online and being
backed up while the backup is being made).
• Depending on the RDBMS, incremental or partial services may be available to record
all changes to the database since the last backup was made.
• A logging service to record all updates to the database as they are being made to a
log file. In a non-RAID (redundant array of inexpensive disks) environment, the log
file(s) should be stored on a different disk drive from that used to store the RDBMS
tables to avoid losing both the data and the log file if the disk drive fails.
• Depending on the RDBMS, dual log files may be maintained to mitigate problems/
issues if errors are encountered when using the primary copy.
Both SQL Server and Oracle support cold, hot, and incremental backups.
As hot backups are taken while the database is being updated, there are additional tech-
nical issues involved to deal with log records created while the backup is being taken.
Backup/Recovery operations consist of the following:
• Taking a full backup of the database: This operation takes a complete copy of the data-
base and provides the initial starting point for a future recovery operation. Depending
on the size of the database, cold backups could require a significant amount of time.
• Logging all changes as the database is updated: Logging operations are a funda-
mental component of all RDBMSs. To ensure logs are always available for recovery
operations, dual logging may be used to create two copies when logs are written to
non-RAID devices.
• Recovering the database: A recovery operation starts with the last full backup of the
database and applies all log updates made up to the point of failure.
• Note that the database must be offline to users during the recovery operation.
Depending on the size of the backup and the number of logs required, the time
required could be unacceptable to the user community.
• More frequent cold database backups shorten the time for database recovery but
normally require more downtime for the user community.
Overview of Databases ◾ 9
• If incremental/partial backups were taken, the recovery would start with the last full
image of the database, apply the latest incremental/partial backup, and apply logs
made after the incremental/partial backup were taken. This approach is more com-
plex to manage but significantly reduces the time required to recover the database.
• Note that this operation requires having a full and usable copy of the database.
If database backups are taken to non-RAID devices, a second copy should be taken
for critical databases to ensure a usable backup file is available if/when needed.
• Recovery operations might also be required in case of extensive user operations/
issues. For example, a full backup of the database is advisable before applying a new
software release for the user. If the software upgrade causes major issues/problems,
the entire system must be restored to a point before the software update was made.
• Service agreements: Major database operations require a database recovery plan,
which details the use and frequency of full and partial database backups. The plan
is included as part of the service agreement drawn up for the user, specifying the
acceptable down time to the user community if database recovery is involved.
1.7 FAILOVER
As noted in Section 1.1, a single instance of an RDBMS may, in turn, serve as host to mul-
tiple databases. Backup and recovery mechanisms have always been used to guarantee the
availability of the data for a database hosted by the RDBMS. Failover mechanisms have
more recently been implemented to guarantee the availability of all databases running on
that RDBMS in the event of a host or RDBMS software failure.
In setting up a failover environment, a remote copy of the source database is created on
another host, either within the same computer complex, in another building, or even in
another city. When normal processing on the host resumes, all log updates are recorded on
the local log and transmitted to the remote log as well. In the event of a system failure, all
connections and processing active on the failed host are switched to the failover host and
RDBMS instance and processing resumes.
If the database is important enough to have failover protection, determination of the
failover site involves a number of issues.
• A failover host at a remote location will of course provide more protection by the
physical separation of the buildings, but the cost of a secure high speed network may
be cost-prohibitive.
Now let us take a look at the detail involved in setting up and running an RDBMS and its
associated databases.
Oracle provides a written procedure for UNIX installations that are very straight-
forward. There are two steps, however, that require a user with root privileges to
perform.
After the RDBMS software has been installed, a database can be created. As part of
that process, the DBA must make decisions on how many files (or Oracle Tablespaces)
will be used to store data, where the various data files will be placed (i.e., what disk
drive(s) to use), and where to store the log files needed for database recovery.
All of these decisions must be made with an eye toward performance. Let us take a minute
to review how data are managed.
• Rows in a table are stored in a page, the smallest amount of data managed by an
RDBMS in I/O operations.
Some RDBMS systems (e.g., Oracle) allow the DBA to specify the page size. A large
page size would be chosen if the data in the table are often processed sequentially;
therefore, with one I/O operation, many rows are retrieved and can be accessed with-
out requiring another I/O operation. A small page size would be chosen if the rows
are processed in a random sequence; the smaller page size would be transferred more
quickly than a large one.
SQL Server, on the other hand, has a fixed page size.
• When a new row is added to a table, the RDMBS will first identify a most desirable
block. Each table will normally have a unique key column (or set of columns), and
the page with the row having the next lowest key value would be the most desirable
location to store the new row.
• After storing the new row, one or more indexes must now be updated to record the
row just added. Each index, in itself, contains a layered set of values, represented in
a tree structure that must be traversed to do a lookup. As a result, for each index,
two or three I/Os may be required to make that update. This operation must then be
repeated for each index associated with that row.
In summary, the addition of just one row to a table requires multiple I/Os to add the row itself
and to update all of the associated indexes associated with that table. The time required to
perform these operations can be drastically reduced if the files being updated can be spread
across multiple devices so that multiple I/O operations can be performed simultaneously.
If the host computer is built which has multiple disk drives available, the following types
of data should be placed on different disk drives:
• Index files should be placed on a different drive(s) from the data files.
• If at all possible, the RDMBS log file must be on a different drive from data files to
ensure recoverability in case the data drive fails.
If the host computer has nothing but a single disk drive, you can, of course, implement the
RDBMS, its databases, and log file(s) on that host. Of course, additional disk drives would
support a higher performance level and have greater odds of being recoverable should a
failure occur.
RAID8 implementations work well and are used whenever possible to support an RDBMS
and its databases. In each case, a RAID configuration is used to support I/O operations as
it was one physical drive. The more significant RAID types used are
• RAID 0: RAID 0 stripes or spreads the I/O activity across all drives in the RAID unit.
Although having no protection for device failure, this implementation is used where
speed is the primary factor.
• RAID 1: RAID 1 implementations focus on disk failure by configuring drives such
that every drive used has a backup or copy. All I/Os are targeted to both, and if one
drive fails, the second drive will continue to support all I/O activity.
• RAID 5: RAID 5 stripes data to support fast performance but includes parity func-
tions that allow operations to continue if a single device fails. A minimal RAID 5
system with four disks would stripe data across three drives and use the fourth
for parity. This configuration supports a higher level of performance while pro-
viding protection for a single device failure. However, when a file (or table) is
updated, additional overhead is generated to make not only the file update but
update the parity map as well. If the RDBMS is update intensive, this overhead
could be significant.
• RAID 10: RAID 10 provides the best of both worlds. Data are striped to support fast
I/O, and each disk has a copy to protect against device failure. Expensive, yes, but it
supports the highest performance as well as protecting against device failure.
RAID drives are game changers when configuring an RDBMS and its databases. A single
RAID 5 system could be used to store and manage all data, indexes, and logs, provid-
ing performance enhancements through striping as well as recovery protection against
device failure. The best possible solution, of course, is to use RAID 10 for everything,
totally removing the DBA from concerns about separation of data and indexes and log
files.
Given an understanding about how to install an RDBMS and creating the associated
databases, the next step involves how to design a database that will meet the user’s needs.
That is covered in Chapter 2.
Overview of Databases ◾ 13
QUESTIONS
1. Do you consider MYSQL to be a database? Why or why not?
2. In the context of a database transaction, what is a unit of work? Why is it important?
3. What are the ACID properties of a RDBMS? Why are they important?
4. In a database recovery operation, what files are used to restore the database? What
does each contain?
5. What’s the difference between a Table and a View?
6. Given the following table structure, write an SQL query to find the EmployeeID and
employee name from the Employee table for those assigned to Department 20.
Employee
EmployeeID
EmployeeFirstName
EmployeeMiddleName
EmployeeLastName
EmployeeWorkPhone Number
EmployeeHomePhone Number
EmployeeStreetAddress
EmployeeCity
EmployeeState
EmployeeZipCode
DepartmentID
7. Are SQL queries identical between products such as Microsoft Access, SQL Server,
and Oracle?
8. Write an SQL query to find the number of employees in Department 20.
9. In writing an SQL query, what’s the difference between = and like?
10. You are part of a team configuring a mission critical database for failover. What are
the issues in locating the failover instance in the same building, in adjacent building,
or in a nearby location?
11. What are the differences between RAID 0 and 1?
14 ◾ A Practical Guide to Database Design
12. You are asked to install a RDBMS on a desktop computer with two internal drives.
How would you configure the system across those drives to provide the best possible
backup/recovery protection?
13. Why are Referential Integrity constraints important?
14. A new database has a new table for Department and another for Employee as shown
in Section 1.5, and Referential Integrity constraints created. Files are available to load
each table. Does it matter in which order the tables are loaded? Why or why not?
15. In setting up backup/recovery operations for a database for use by your business,
what considerations are there in making/storing external backups for the database?
REFERENCES
1. Database management system, Techopedia, Retrieved from https://www.techopedia.com/
definition/24361/database-management-systems-dbms (accessed August 18, 2017).
2. Rouse, M., Transaction, Whatis.com, Retrieved from http://searchcio.techtarget.com/
definition/transaction (accessed August 18, 2017).
3. Database logical unit of work (LUW), SAP Documentation, Retrieved from https://help.
sap.com/saphelp_nwpi71/helpdata/en/41/7af4bca79e11d1950f0000e82de14a/content.htm
(accessed August 18, 2017).
4. Pinal, D., ACID, SQLAuthority.com, Retrieved from https://blog.sqlauthority.com/2007/12/
09/sql-server-acid-atomicity-consistency-isolation-durability/ (accessed August 18, 2017).
5. Referential integrity, Techopedia, Retrieved from https://www.techopedia.com/
definition/1233/referential-integrity-ri (accessed August 18, 2017).
6. Relational database view, essentialSQL, Retrieved from https://www.essentialsql.com/what-
is-a-relational-database-view/ (accessed August 18, 2017).
7. Arvin, T., Comparison of different SQL implementations, Retrieved from http://troels.arvin.
dk/db/rdbms/ (accessed August 18, 2017).
8. RAID, PRESSURE.COM, Retrieved from https://www.prepressure.com/library/technology/
raid (accessed August 18, 2017).
Chapter 2
Data Normalization
2.1 INTRODUCTION
The first step in designing a database is to decide what needs to be done by identifying
the data requirements of the users. That sounds deceptively simple, and it can be, but it
usually is not. Why? It is because users, the people for whom the system is being built,
rarely can clearly describe what they need. That is bad enough, but the problems are
often compounded quite unintentionally by the data-processing (DP) staff. These good
folks, normally quite knowledgeable in the current system, know in detail what data exist,
and how the existing programs work. However, what is rarely matches what is needed in
the future system. Often, in-depth knowledge of how the system functions causes tunnel
vision in the DP staff as well as the users and inhibits creative thought in determining the
requirements for a new, enhanced system.
In short, the system design team, composed of user representatives, computer analysts,
and/or programmers, and the database/data administration team, cannot communicate
effectively.
A process called normalization can solve these problems. By using this technique,
users describe what their requirements are without the use of the buzz words and terms
so common in DP. The DP and data administration staff participate in these discussions,
recording data requirements while serving as advisors in the analysis and providing their
insight into what the new system might consist of. Afterward, when all data requirements
have been identified, the technical staff can then decide on the details for a satisfactory
physical database design.
This analysis technique will not be easy for end users to immediately understand and
use. However, only a few basic concepts and definitions are required to begin. With the
proper introduction and coaching, users can fully participate as members of the design
team. Their input is so important to this process that many companies have created
15
16 ◾ A Practical Guide to Database Design
Those of you with DP background have probably matched the terms entity and
attribute with the terms record and field, respectively. That is exactly what they are,
or how they may be implemented. The difference is that entities represent what you
Data Normalization ◾ 17
You will soon find that a massive volume of notes will be generated, along with the need
to get this information organized in some way. The easiest thing to do is to enter these
initial entity/attribute lists in a data modeling tool, such as erwin (discussed in Chapter 5).
Data modeling tools are specialized applications that allow you to capture and
record definitions for entities, their associated attributes, and illustrate the asso-
ciation between entities (the data model). In addition, they support mapping of
logical to physical designs and can generate DDL (Data Definition Language)
statements to create tables in DBMS products such as Structured Query Language
Server and Oracle.
Once these reports are available, review them to verify that an attribute is precisely and
accurately defined. All team participants must agree on these definitions; disagree-
ment often identifies the need for additional attributes to be created. Fortunately, a data
modeling tool greatly reduces the administrative tasks required.
• Customers will connect to the company’s website, search Advertised_Items for sale,
and place orders for the items selected. The orders are then filled and shipped to the
customer by the stock room.
• Each item advertised has a reorder level. When the number of items in inventory
drops to or below that quantity, available suppliers are searched and additional items
will be ordered from one of those suppliers on the basis of their current selling price.
The data model to be created will include data requirements to create orders for customers,
to fill orders from the stock room, to monitor inventory quantities, and to reorder items
from suppliers as needed.
As part of the design team, consider the above-mentioned overview of this environment
and try to come up with five or six entities that seem appropriate (e.g. customer). Next,
try to identify at least four attributes for each (as in Customer Name, Address, and Phone
Number). Be sure to clearly define the meaning of each attribute that you identify.
When you have completed your list, compare your answer to that of Figure 2.1. This will
serve as a starting point for the next phase of analysis, as it contains several intentional
errors which will be resolved as part of the normalization process.
Data Normalization ◾ 19
Attribute Description
Customer
CustomerTelephoneNumber The customer’s telephone number
CustomerName The customer’s name
CustomerStreetAddress The street name associated with the customer’s account
CustomerCity The city in which the customer lives
CustomerState The state in which the customer lives
CustomerZipCode The customer’s zip code
CustomerCreditRating The credit rating for this customer
OrderNumber An order number for this customer
Order
OrderNumber A unique identifier for each order
CustomerPhoneNumber The customer’s telephone number
CustomerName The unique name for this customer
OrderDate The date when the order was placed
NumberOfDays The number of days from when the order was placed until shipped
CustomerStreetAddress The street address for where the order is to be shipped
CustomerCity The city to which the order is to be shipped
CustomerState The state to which the order is to be shipped
CustomerZipCode The zip code associate with the shipping address
CustomerCreditCardNumber The credit card number used for this purchase
CustomerCreditCardName The customer’s name on the credit card used
StockNumber The stock number for the item purchased
ShippingDate The date the order was shipped
Advertised_Item
ItemNumber The unique identifier for each Advertised_Item
ItemDescription A description of the item advertised
ClothingFlag A code identifying clothing items
HealthFlag A code identifying items as Health and Beauty
ItemWeight The shipping weight for each item
ItemColor The color of the item
ItemPrice The selling price of the item sold
SupplierCode The unique identifier for the supplier of this item
OrderNumber The order number on which this item appears
Supplier
SupplierID A unique identifier for each supplier
CompanyName The unique name for this supplier
SupplierStreetAddress The street address for this supplier’s main office
SupplierCity The city in which the supplier’s main office is located
SupplierState The state in which the supplier’s main office is located
SupplierZipCode The zip code for the supplier’s main office
StockNumber The unique identifier for each advertised item
Purchased_Item
ItemNumber The unique identifier for each item on an order
ItemDescription The description of the item advertised
QuantityOrdered The number of items purchased
SellingPrice The price of the item purchased
ShippingDate The date the item purchased was shipped to the customer
Advertised_Item Supplier
SupplierCode <== error SupplierID <== error
By using more than one name for the same attribute will cause many problems, including
a failure to recognize one-to-many (1:M) relationships when the data model is developed.
Advertised_Item Supplier
SupplierID <== correction SupplierID
Customer Supplier
CompanyName - The unique name CompanyName - The unique
for this customer <== error name for this supplier <== error
Customer Supplier
CustomerName - The unique name SupplierName - The unique name
for this customer <== correction for this supplier <== correction
Employee
EmployeeAge <== error
EmployeeBirthDate <== error
In this example, removing EmployeeAge will eliminate the error condition. When needed,
the employee’s age can be derived using the EmployeeBirthDate and the current date.
Employee
Married <== error
Single <== error
Errors of this type often represent values of a larger category. Whenever possible, resolve
the error by creating the larger categorical attribute.
In this case, these two elements can be resolved by creating an attribute of
“MaritalStatus,” which would have a value of either M (Married) or S (Single).
Employee
MaritalStatus—An indicator of
the Employee’s marital status
Study Figure 2.1 and see what suggestions you would make to correct any discrepancies as
defined earlier. When you are finished, compare your list with the comments below, and
the revised entity/attribute list shown in Figure 2.2.
Customer Order
CustomerTelephoneNumber CustomerTelephoneNumber <== correction
Advertised_Item Supplier
ItemNumber <== error StockNumber <== error
22 ◾ A Practical Guide to Database Design
Attribute Description
Customer
CustomerTelephoneNumber The customer’s telephone number
CustomerName The customer’s name
CustomerStreetAddress The street name associated with the customer’s account
CustomerCity The city in which the customer lives
CustomerState The state in which the customer lives
CustomerZipCode The customer’s zip code
CustomerCreditRating The credit rating for this customer
OrderNumber An order number for this customer
Order
OrderNumber A unique identifier for each order
CustomerTelephoneNumber The customer’s telephone number
CustomerName The unique name for this customer
OrderDate The date when the order was placed
ShippingStreetAddress The street address for where the order is to be shipped
ShippingCity The city to which the order is to be shipped
ShippingState The state to which the order is to be shipped
ShippingZipCode The zip code associate with the shipping address
CustomerCreditCardNumber The credit card number used for this purchase
CustomerCreditCardName The customer’s name on the credit card used
StockNumber The stock number for the item purchased
ShippingDate The date the order was shipped
Advertised_Item
ItemNumber The unique identifier for each Advertised_Item
ItemDescription A description of the item advertised
ItemDepartment A code classifying the item into one of the various product
categories of items for sale
ItemWeight The shipping weight for each item
ItemColor The color of the item
ItemPrice The selling price of the item sold
SupplierID The unique identifier for the supplier of this item
OrderNumber The order number on which this item appears
Supplier
SupplierID A unique identifier for each supplier
CompanyName The unique name for this supplier
SupplierStreetAddress The street address for this supplier’s main office
SupplierCity The city in which the supplier’s main office is located
SupplierState The state in which the supplier’s main office is located
SupplierZipCode The zip code for the supplier’s main office
ItemNumber The unique identifier for each advertised item
Item_Ordered
ItemNumber The unique identifier for each item on an order
ItemDescription The description of the item advertised
QuantityOrdered The number of items purchased
SellingPrice The price of the item purchased
ShippingDate The date the item purchased was shipped to the customer
The ItemNumber attribute represents the unique identifier for each advertised item, as
does StockNumber in Supplier. The same attribute name must be used in both entities as
they represent the same information.
Advertised_Item Supplier
ItemNumber ItemNumber <== correction
Customer Order
CustomerStreetAddress <== error CustomerStreetAddress <== error
CustomerCity <== error CustomerCity <== error
CustomerState <== error CustomerState <== error
CustomerZipCode<== error CustomerZipCode<== error
The Customer address attributes refer to the address associate with the customer’s account/
home location, but the Order’s address attributes refer to where the order is to be shipped.
The Order attributes must be changed.
Customer Order
CustomerStreetAddress ShippingStreetAddress <== correction
CustomerCity ShippingCity <== correction
CustomerState ShippingState <== correction
CustomerZipCode ShippingZipCode <== correction
Item
ItemDepartment – A code classifying each item into one of the various product categories of items for sale
2.5 NORMALIZATION
Now that a clean entity/attribute list exists in which an attribute has one and only one
name as well as a unique meaning, the normalization process can begin.
More formally stated, normalization is the process of analyzing the dependencies
between attributes within entities. Each attribute is checked against three or more sets of
rules, then making adjustments as necessary to put each in first, second, and third normal
form (3NF). (It is possible you may want to move further to fourth or fifth normal form
[5NF], but in most cases, 3NF is not only adequate, but preferred; more later.) These rules
will be reviewed in detail in the next section and provide a procedural way to make sure
attributes are placed where they belong.
Based on mathematical theory, normalization forms the basis for the implementa-
tion of tables within relational database systems. In practice, it is simply applied common
sense; for example, you should only put attributes in an employee entity (table) attributes
(or columns) that describe the employee. If you should find an attribute that describes
something else, put it wherever it belongs.
Employee
EmployeeID
DeductionAmount <== error
Practical issues, typically related to performance, may later require you to use tricks of one
kind or another when setting up physical structures. But you are not there yet! Place your
data in 3NF and do all subsequent analysis with that view of data. Later, when all require-
ments are known, and after considering usage requirements, decisions will be made regard-
ing physical structures. If at that time non–third normal structures are needed for reasons
of efficiency, fine. For now, however, it is far too early to make judgments or decisions
related to physical design.
The following steps put the data model into, successively, first, second, and 3NFs. Keep
in mind two points. First, although they may appear overly meticulous, they provide the
Data Normalization ◾ 25
user with specific guidance on how to put the data model into 3NF. Second, after you have
developed several data models, the result will appear as common sense to you, and you will
tend to think third normal and create entity/attribute data models in 3NF automatically. So,
although the process appears to be tedious, it really is not.
Department Employee
DepartmentNumber EmployeeNumber
EmployeeNumber <== error – repeating group
Step 1: The repeating attribute must be removed from the entity in which it appears, after
assuring that the attribute exists in the data model in 1NF.
Data Normalization ◾ 27
Attribute Description
Customer
**CustomerIdentifier The alpha-numeric string that uniquely identifies each customer
CustomerTelephoneNumber The customer’s telephone number
CustomerName The customer’s name
CustomerStreetAddress The street name associated with the customer’s account
CustomerCity The city in which the customer lives
CustomerState The state in which the customer lives
CustomerZipCode The customer’s zip code
CustomerCreditRating The credit rating for this customer
OrderNumber An order number for this customer
Order
**OrderNumber A unique identifier for each order
CustomerTelephoneNumber The customer’s telephone number
CustomerName The unique name for this customer
OrderDate The date when the order was placed
ShippingStreetAddress The street address for where the order is to be shipped
ShippingCity The city to which the order is to be shipped
ShippingState The state to which the order is to be shipped
ShippingZipCode The zip code associate with the shipping address
CustomerCreditCardNumber The credit card number used for this purchase
CustomerCreditCardName The customer’s name on the credit card used
StockNumber The stock number for the item purchased
ShippingDate The date the order was shipped
Advertised_Item
**ItemNumber The unique identifier for each Advertised_Item
ItemDescription A description of the item advertised
ItemDepartment A code classifying the item into one of the various product categories
of items for sale
ItemWeight The shipping weight for each item
ItemColor The color of the item
ItemPrice The selling price of the item sold
SupplierID The unique identifier for the supplier of this item
OrderNumber The order number on which this item appears
Supplier
**SupplierID A unique identifier for each supplier
CompanyName The unique name for this supplier
SupplierStreetAddress The street address for this supplier’s main office
SupplierCity The city in which the supplier’s main office is located
SupplierState The state in which the supplier’s main office is located
SupplierZipCode The zip code for the supplier’s main office
ItemNumber The unique identifier for each item
Item_Ordered
**ItemNumber The unique identifier for each item on an order
**OrderNumber A unique identifier for each order
ItemDescription The description of the item advertised
QuantityOrdered The number of items purchased
SellingPrice The price of the item purchased
ShippingDate The date the item purchased was shipped to the customer
You cannot just throw the attribute away. First ensure that the attribute exists where
it belongs. Analyze what the attribute describes and, if necessary, create a new entity
in 1NF. Once an entity exists (or is identified), move the attribute to that entity and
remove the repeating attribute in which it was originally found.
In the above-mentioned example, EmployeeNumber is a repeating group with
Department (departments have more than one employee). As EmployeeNumber
was also found (correctly) in Employee, the error is resolved by simply removing
EmployeeNumber from Department.
Step 2: Next, study the relationship between the entities from where the repeating
attribute came from and where it moved to. Determine if the from–to relationship
is one-to-many (1:M) or many-to-many (M:M).
In the above-mentioned Department–Employee example, the from entity is
Department, and the to entity is Employee. To determine if this is a 1:M or M:M, ask
“for one Department, are there one or many employees?”, then repeat the question
in reverse by asking “for one employee, are there one or many departments?”. In this
case, one Department has many employees, but one employee is associated with one
department. Therefore, the relationship is 1:M.
When the relationship is 1:M, this is an acceptable relationship and no further adjust-
ments are necessary. If, on the other hand, the answer is M:M, then one final check/
adjustment is necessary before you move to Step 3.
In reviewing the OrderNumber entity in the Customer entity, the OrderNumber
attribute already exists within Order. It can therefore be removed from the Customer
entity. Next, in checking the relationship between the Customer and Order entities,
we find that one customer can have many orders, but one order relates to a single
customer. Therefore, there is a 1:M relationship between Customer and Order, and no
further adjustments are necessary.
However, now consider the ItemNumber attribute with Supplier.
• Where does ItemNumber belong to? As it is the unique identifier for an
Advertised_Item, it belongs under Advertised_Item and already exists there. It
can therefore be removed from the Supplier entity.
• Is the relationship 1:M or M:M? One item can be purchased from many sup-
pliers, and a single supplier provides many items. Therefore, the relationship
is M:M.
• Now, we have a problem. The M:M relationship between Advertised_Item and
Supplier requires that we have an entity to hold information (attributes) about
one Advertised_Item when purchased from a unique supplier. For example,
how much does that supplier charge for that item, and what was the quality
Data Normalization ◾ 29
If it does not already exist, a new entity must be created to store the attributes common
to the entities in the M:M relationship (in this case, Advertised_Item and Supplier);
see Step 3.
Step 3: Convert each M:M relationship into two 1:M relationships by creating
(if necessary) a new derived entity.
A M:M relationship poses a basic design problem because there is no place to store
attributes common to the two entities involved, that is, attributes that lie in the
intersection of the two entities.
Creating a new entity to store the intersection elements transforms the M:M relation-
ship into two 1:M relationships and provides a storage location for the data common
to the two entities involved.
In the above-mentioned example, create a new entity Reorder_Item, which repre-
sents information for items that can be ordered from suppliers. This modifies the data
model to include two new 1:M relationships:
Study the information contained in Figure 2.3 and make a list of any adjustments you
feel are necessary to put the entity/attribute list in 1NF. When you are ready, compare
your solution to that shown in Figure 2.4.
For 2NF, each nonkey attribute must depend on the key and all parts of the key.
If the value of an attribute can be determined by knowing only part of the entity’s key,
there is a violation of 2NF.
30 ◾ A Practical Guide to Database Design
Attribute Description
Customer
**CustomerIdentifier The alpha-numeric string that uniquely identifies each customer
CustomerTelephoneNumber The customer’s telephone number
CustomerName The customer’s name
CustomerStreetAddress The street name associated with the customer’s account
CustomerCity The city in which the customer lives
CustomerState The state in which the customer lives
CustomerZipCode The customer’s zip code
CustomerCreditRating The credit rating for this customer
Order
**OrderNumber A unique identifier for each order
CustomerTelephoneNumber The customer’s telephone number
CustomerName The unique name for this customer
OrderDate The date when the order was placed
ShippingStreetAddress The street address for where the order is to be shipped
ShippingCity The city to which the order is to be shipped
ShippingState The state to which the order is to be shipped
ShippingZipCode The zip code associate with the shipping address
CustomerCreditCardNumber The credit card number used for this purchase
CustomerCreditCardName The customer’s name on the credit card used
ShippingDate The date the order was shipped
Advertised_Item
**ItemNumber The unique identifier for each Advertised_Item
ItemDescription A description of the item advertised
ItemDepartment A code classifying the item into one of the various product categories
of items for sale
ItemWeight The shipping weight for each item
ItemColor The color of the item
ItemPrice The selling price of the item sold
Supplier
**SupplierID A unique identifier for each supplier
SupplierName The unique name for this supplier
SupplierStreetAddress The street address for this supplier’s main office
SupplierCity The city in which the supplier’s main office is located
SupplierState The state in which the supplier’s main office is located
SupplierZipCode The zip code for the supplier’s main office
Item_Ordered
**ItemNumber The unique identifier for each item on an order
**OrderNumber A unique identifier for each order
ItemDescription The description of the item advertised
QuantityOrdered The number of items purchased
SellingPrice The price of the item purchased
ShippingDate The date the item purchased was shipped to the customer
Restock_Item
**ItemNumber The unique identifier for each item on an order
**SupplierID A unique identifier for each supplier
PurchasePrice The current cost of this item if purchased from this supplier
Consider the following definition of a Payroll Deduction entity to record deductions for
every paycheck.
Payroll Deduction
**EmployeeID
**DateDeductionTaken
EmployeeName <== 2NF error; dependent only on EmployeeID
DeductionType
DeductionAmount
Here, the EmployeeName is an error for 2NF because the name is dependent only on the
EmployeeID and not the date when the deduction was taken.
Any adjustments for 2NF violations follow the same steps as for 1NF errors. First,
ensure that the attribute in error exists correctly in 1NF. Next, check the relationship
between the entity in error (here it is Payroll Deduction) and the entity in which the
attribute is correctly associated (Employee). If there is a 1:M relationship, no further
adjustments are necessary. However, if there is an M:M relationship between these
entities, verify that the intersection entity exists, and if necessary, create that derived
entity.
Almost all violations of 2NF are found in entities having more than one attribute con-
catenated together to form the entity key. It seems almost trivial to state that, for entities
with a single key, nonkey attributes must rely on that key. (For example, in a Customer
entity, you would expect to find only attributes that provide information about that cus-
tomer.) If, when the entity/attribute lists are created, keys were immediately identified and
checks made to ensure that only attributes dependent on that key were added, you need
only check entities that use concatenated attributes for keys. If that initial check was not
done, you should check all attributes in each entity to verify that attributes are dependent
on that entity’s key.
Take a look at the 1NF solution in Figure 2.4 and determine what, if any, further adjust-
ments are necessary to move to 2NF. When you are finished, compare your results to
Figure 2.5.
Attribute Description
Customer
**CustomerIdentifier The alpha-numeric string that uniquely identifies each customer
CustomerTelephoneNumber The customer’s telephone number
CustomerName The customer’s name
CustomerStreetAddress The street name associated with the customer’s account
CustomerCity The city in which the customer lives
CustomerState The state in which the customer lives
CustomerZipCode The customer’s zip code
CustomerCreditRating The credit rating for this customer
Order
**OrderNumber A unique identifier for each order
CustomerTelephoneNumber The customer’s telephone number
CustomerName The unique name for this customer
OrderDate The date when the order was placed
ShippingStreetAddress The street address for where the order is to be shipped
ShippingCity The city to which the order is to be shipped
ShippingState The state to which the order is to be shipped
ShippingZipCode The zip code associate with the shipping address
CustomerCreditCardNumber The credit card number used for this purchase
CustomerCreditCardName The customer’s name on the credit card used
ShippingDate The date the order was shipped
Advertised_Item
**ItemNumber The unique identifier for each Advertised_Item
ItemDescription A description of the item advertised
ItemDepartment A code classifying the item into one of the various product
categories of items for sale
ItemWeight The shipping weight for each item
ItemColor The color of the item
ItemPrice The selling price of the item sold
Supplier
**SupplierID A unique identifier for each supplier
SupplierName The unique name for this supplier
SupplierStreetAddress The street address for this supplier’s main office
SupplierCity The city in which the supplier’s main office is located
SupplierState The state in which the supplier’s main office is located
SupplierZipCode The zip code for the supplier’s main office
Item_Ordered
**ItemNumber The unique identifier for each item on an order
**OrderNumber A unique identifier for each order
QuantityOrdered The number of items purchased
SellingPrice The price of the item purchased
ShippingDate The date the item purchased was shipped to the customer
Restock_Item
**ItemNumber The unique identifier for each item on an order
**SupplierID A unique identifier for each supplier
PurchasePrice The current cost of this item if purchased from this supplier
If a nonkey attribute’s value can be obtained simply by knowing the value of another
nonkey attribute, the entity is not in 3NF.
For example, in the Order entity, you see the nonkey attributes
CustomerCreditCardNumber and CustomerCreditCardName. There is an obvious depen-
dency between the credit card number and the name on that credit card. A violation of
3NF exists, and CustomerCreditCardName must be removed.
How do we make this adjustment? Once again, repeat the steps covered under
1NF adjustments. Make sure the proper entity exists, then check for 1:M versus M:M
relationships. In this case, you need to create a new entity Credit_Card to store infor-
mation about the credit cards that the customer might use. The key for the entity is
CustomerCreditCardNumber, and the CustomerCreditCardName would be a nonkey
attribute.
Start from the 2NF solution in Figure 2.5 and see what other adjustments are necessary
to get to 3NF, then check your solution against that shown in Figure 2.6.
By the way, try this shortcut expression for the rules of normalization.
For an entity to be in 3NF, each nonkey attribute must relate to the key, the
whole key, and nothing but the key.
The entity/attribute list is now in 3NF, and each attribute is where it belongs. Nonkey
attributes appear only once, in the entity which they describe. Attributes in key fields can
(and do) appear in several related entities. These repeated occurrences establish the vari-
ous one-to-many relationships that exist between entities. This redundancy is required
for a relational DBMS, which provides data linkage on the basis of data content only. For
example, those entities related to Advertised_Item (Ordered_Item and Restock_Item)
must all have the Advertised_Item’s key, ItemNumber, stored in them to support access/
data retrieval by ItemNumber.
It is important to keep in mind that the final data structure/design has not been
created/finalized yet. The data model is just a logical view of data elements and how
they relate to each other. This logical structure could be used to create tables and table
structures, but the final decisions on database design and table structures will come
later, after considering the results of how the data will be accessed. This is covered later
in the book.
For now, there is one final step at this part of the analysis, which clarifies your solution
by drawing a picture of the data relationships that have been created.
34 ◾ A Practical Guide to Database Design
Attribute Description
Customer
**CustomerIdentifier The alpha-numeric string that uniquely identifies each customer
CustomerTelephoneNumber The customer’s telephone number
CustomerName The customer’s name
CustomerStreetAddress The street name associated with the customer’s account
CustomerCity The city in which the customer lives
CustomerState The state in which the customer lives
CustomerZipCode The customer’s zip code
CustomerCreditRating The credit rating for this customer
Order
**OrderNumber A unique identifier for each order
CustomerIdentifer The unique name for this customer
OrderDate The date when the order was placed
ShippingStreetAddress The street address for where the order is to be shipped
ShippingCity The city to which the order is to be shipped
ShippingState The state to which the order is to be shipped
ShippingZipCode The zip code associate with the shipping address
CustomerCreditCardNumber The credit card number used for this purchase
ShippingDate The date the order was shipped
Ordered_Item
**OrderNumber A unique identifier for each order
**ItemNumber The unique identifier for each Advertised_Item
QuantityOrdered The number of items purchased
SellingPrice The price of the item purchased
ShippingDate The date the item purchased was shipped to the customer
Advertised_Item
**ItemNumber The unique identifier for each Advertised_Item
ItemDescription A description of the item advertised
ItemDepartment A code classifying the item into one of the various product
categories of items for sale
ItemWeight The shipping weight for each item
ItemColor The color of the item
ItemPrice The selling price of the item sold
Supplier
**SupplierID A unique identifier for each supplier
SupplierName The unique name for this supplier
SupplierStreetAddress The street address for this supplier’s main office
SupplierCity The city in which the supplier’s main office is located
SupplierState The state in which the supplier’s main office is located
SupplierZipCode The zip code for the supplier’s main office
Restock_Item
**ItemNumber The unique identifier for each item on an order
**SupplierID A unique identifier for each order
ReorderPrice The current cost of new items ordered from this supplier
Credit_Card
**CustomerCreditCardNumber The credit card number used for this purchase
CustomerCreditCardName The customer’s name on the credit card used
Customer Order
In words, there is a one-to-many relationship between Customer and Order, or, for each
customer, there are many orders.
This is a fairly easy diagram to draw. Begin by simply drawing and labeling a box for
each entity. Then, for each entity, ask “Does the key of this entity exist entirely within
another entity?” If so, connect them with a line.
Draw the data model for the 3NF solution in Figure 2.6 and compare your results with
that shown in Figure 2.7.
Ordered_Item Restock_Item
**OrderNumber **SupplierID
**ItemNumber **ItemNumber
QuantityOrdered ReorderPrice
SellingPrice
ShippingDate
Order
**OrderNumber
CustomerIdentifer
OrderDate
ShippingStreetAddress
ShippingCity
ShippingState
ShippingZipCode
CustomerCreditCardNumber
ShippingDate
Credit_Card
**CustomerCreditCardNumber
CustomerCreditCardName
* Note: 1:M relationships are often drawn using a >-------<< connector line.
36 ◾ A Practical Guide to Database Design
Be very rigorous about drawing the 1:M lines and only draw them in which “the key of one
appears within the many.” When finished, check your diagram carefully to see that it “makes
sense.” You may find two entities that logically should be related but have not been connected
by a line. Adjust the data model attributes as needed to allow you to then add the 1:M line.
For example, in Figure 2.7, common sense suggests that there should be a 1:M relation-
ship between Customer and Credit_Card. One was not drawn in Figure 2.7 because the
customer’s key was not (at that time) in Credit_Card. After reviewing this, it is clear that
there should be a 1:M relationship between these two entities, so we add the customer key
(CustomerIdentifier) to Credit_Card, then draw the 1:M relationship between the two. See
the revised entity/attribute diagram in Figure 2.8.
For completeness, I will now review fourth and 5NFs, although in my experience, they
have limited practical application for real-world applications/databases.
Ordered_Item Restock_Item
**OrderNumber **SupplierID
**ItemNumber **ItemNumber
QuantityOrdered ReorderPrice
SellingPrice
ShippingDate
Order
**OrderNumber
CustomerIdentifer
OrderDate
ShippingStreetAddress
ShippingCity
ShippingState
ShippingZipCode
CustomerCreditCardNumber
ShippingDate
Credit_Card
**CustomerCreditCardNumber
CustomerCreditCardName
CustomerIdentifier
Pizza Delivery
**Restaurant
**Pizza_Variety <== 4NF error
**Delivery_Area <== 4NF error
If a restaurant offers all varieties at all locations, the 4NF issue is that Pizza_Variety
depends only on Restaurant and is independent of Delivery_Area. The 4NF solution
would break the above into two tables:
Restaurant Variety
**Restaurant
**Pizza_Variety
Delivery Area
**Restaurant
**Delivery_Area
Although from the theoretical point of view this might provide a cleaner definition/
delineation of data types, in practice it breaks the data into two tables. As a result, when
data about restaurants, varieties, and delivery areas are needed, more I/O is required to
retrieve the same equivalent as opposed to the 3NF solution, and application performance
is degraded.
5NF takes this one step further.
Once again, there is an example that clarifies the intent of this requirement. The 3NF
solution for information regarding traveling salesman, brands, and product information
calls for a table with the following design:
Brand by Salesman
**Traveling_Salesman
**Brand
Once again, an implementation based on 5NF requires that the data be implemented across
more tables. Application logic processing these data will generate more I/O activity, and
performance will be degraded.
Does this matter? The answer is “It depends.”
I find it interesting to recall that “It depends” is the correct response to almost any
question on database design or implementation. Then again, after being in this
business for more than 40 years, I have seen or been involved in projects ranging
from the smallest to the largest.
For simple, small systems, the additional I/O may not make a difference. In general, however,
a large, complex system would (in my opinion) generally suffer from a 5NF implementa-
tion. Early prototyping of a solution may be required to predict or measure the performance
of a proposed solution to see if performance will be acceptable. Depending on the size and
importance of the system being designed, two steps are typically taken.
1. Vendors typically publish performance criteria that reflect how a specified config-
uration will perform to a standard workload. Mapping the anticipated application
data workload against these standards can measure the effectiveness of the proposed
configuration.7
Data Normalization ◾ 39
Does performance matter? For a database of sufficient size or complexity, absolutely. When
I last worked for Southwestern Bell Telephone/AT&T, I was a manager on the project to
design a database that would support anything the company might sell/offer in the future.
Our design team did an in-depth analysis of the associated data and derived a 5NF set of
entities for this environment. The resulting design contained more than 500 entities. At
this phase, we felt we could not make any performance-related changes in the data model
because we did not want to hamper/prevent more sophisticated data accesses if/when new
data requirements were identified.
When we finished the 5NF solution, we implemented it and used it with a modified bill-
ing program (revised to run against the relational tables) that had to run each night. The
nightly run took days to complete.
So where are we in terms of database design? In practice, I start with a 3NF design as
a hypothetical solution for relational tables. It is my understanding that this, in general,
is an industry practice, and 4NF and above are rarely used outside of academic circles.8
I then analyze data access required to meet application requirements and modify the logi-
cal design as needed to a design for physical tables. These steps will be covered in detail in
the next chapter.
QUESTIONS
1. Define the terms entity and attribute and give examples of each.
2. In developing a new data model for Customers, some having multiple offices, a data
requirement for a “Customer Location” has been identified. Is this an entity, or an
attribute, and why?
Referring to the data model shown in Figure 2.8:
3. If the attribute “PreferredCreditCard” is to be added, how would the entity/attribute
list change?
4. If information were to be added to track the quality of items sold by a supplier, what
changes would be made to the data model?
5. What is the difference between the ItemPrice in Advertised_Item and SellingPrice in
Ordered_Item?
You are asked to participate in creating a logical data model for a physician’s office.
6. List five entities you would expect to see in this data model and give a description of
each.
7. For each entity, list five attributes that would be appropriate for each.
40 ◾ A Practical Guide to Database Design
REFERENCES
1. Date, C. J., An Introduction to Database Systems, Reading, MA, Addison-Wesley, 1985, p. 367.
2. Date, C. J., An Introduction to Database Systems, Reading, MA, Addison-Wesley, 1985, p. 370.
3. Date, C. J., An Introduction to Database Systems, Reading, MA, Addison-Wesley, 1985, p. 373.
4. Fagin, R., Multivalued Dependencies and a New Normal Form for Relational Databases,
ACM Transactions on Database Systems, 2, 267, 1977.
5. Fourth normal form, Wikipedia.com, https://en.wikipedia.org/wiki/Fourth_normal_form
(accessed August 23, 2017).
6. Fifth normal form, Wikipedia.org, https://en.wikipedia.org/wiki/Fifth_normal_form
(accessed August 23, 2017).
7. Active TPC benchmarks, TPC.org, http://www.tpc.org/information/benchmarks.asp
(accessed August 23, 2017).
8. Fourth normal form, Techopedia.com, https://www.techopedia.com/definition/19453/
fourth-normal-form-4nf (accessed August 23, 2017).
Chapter 3
Database Implementation
41
42 ◾ A Practical Guide to Database Design
Ordered_Item Restock_Item
**OrderNumber **SupplierID
**ItemNumber **ItemNumber
QuantityOrdered ReorderPrice
SellingPrice
ShippingDate
Order
**OrderNumber
CustomerIdentifer
OrderDate
ShippingStreetAddress
ShippingCity
ShippingState
ShippingZipCode
CustomerCreditCardNumber
ShippingDate
Credit_Card
**CustomerCreditCardNumber
CustomerCreditCardName
CustomerIdentifier
If we change the physical model to add OrderTotalCost to Order, the access paths
would be
Once again, accesses are significantly reduced at the cost of implementing logic within
a trigger or stored procedure to update this information as Supplier price updates are
received.
• When Customer information is retrieved, include their preferred credit card number.
The revised 3NF data model now represents a physical data model, showing all tables
and the columns to be defined within each.
Design modifications are generally always necessary to provide satisfactory levels for
the application. The earlier the performance issues can be identified, the easier they
are to address without impacting software or database changes as design and testing
progress.
• Are key column(s) an absolute requirement? In most cases, yes, because database
applications inherently store and update data throughout the life cycle of the data ele-
ments. (This is the time from when data are added to a table until it is deleted because
it is no longer needed.) Without uniqueness, we have no way of retrieving or updating
a specific row in the table.
There are, however, exceptions. If data records are being collected and stored in a
table that can be analyzed and treated as a group without the need to retrieve and
update individual rows, a key identifier/column may not be required. For example, if
bulk data are being collected that includes a column for date and time, that column
can be used to review and summarize results for that date, and that set of rows can be
deleted when no longer needed.
• What data type is appropriate for each key? Are there any inherent business rules that
guide or determine acceptable values?
For example, the Customer table in the physical model has a “CustomerIdentifier”
column that will uniquely retrieve one specific row for a Customer. The user may
have business rules defining how CustomerIdentifier values are assigned for
new Customers. In that case, the storage format for CustomerIdentifier might be
“varchar(50)” (i.e., a character field varying in length up to 50 characters).
Database Implementation ◾ 45
After all data types have been reviewed, as one last step before creating tables in a database,
you must identify indexes needed to efficiently support relationships across tables.
3.4 INDEXES
As noted earlier, RDMBS systems and SQL give the user or application the ability to ask for
any data at any time. For example, the following query will retrieve all Restock_Item rows
for a specified Advertised_Item.
If the Restock_Item table was implemented without any indexes, the above-mentioned
search would still execute and return the correct information/matching rows. However,
if an index was created in the Restock_Item table on the ItemNumber column, the index
would be used to immediately retrieve only the rows that matched that ItemNumber. The
answer to the query is the same, but the index allows for the data to be retrieved without all
of the I/Os required to do a table scan.
Identifying and creating indexes, then, are critical in obtaining acceptable performance
levels for a database application. As a general rule, indexes should be created in tables for
• The column(s) that make up the primary key for the table.
• Columns that serve as foreign keys, that is, those columns on the many side of a
one-to-many relationship.
• Columns found used as search criteria in the usage path analysis.
46 ◾ A Practical Guide to Database Design
These criteria in themselves may not result in the best possible database design, but they
will get you off to a good start. In addition, tools are available for SQL Server and Oracle to
help you monitor I/O activity while your application is being designed and tested to help
you identify and deal with unexpected performance issues.
Ordered_Item Restock_Item
**OrderNumber **SupplierID
**ItemNumber **ItemNumber
QuantityOrdered ReorderPrice
SellingPrice
ItemShippingDate
Order
**OrderNumber
CustomerIdentifer
OrderDate
ShippingStreetAddress
ShippingCity
ShippingState
ShippingZipCode
CustomerCreditCardNumber
OrderShippingDate
OrderTotalCost
Credit_Card
**CustomerCreditCardNumber
CustomerCreditCardName
CustomerIdentifier
PreferredOption
The detailed design screen will open. Under “Field Name,” type the name of the first
column (in this example, “CustomerIdentifier” in the Customer table), then click the
pull-down under “Data Type” and select “Number” to reflect a decision to use the
auto-increment feature for this column.
Database Implementation ◾ 49
Continue adding column names under “Field Name,” entering an appropriate Data
Type for each. The design of the Customer table appears as follows.
When closing the table’s design, you will be prompted to enter the name of the table
being saved.
In addition, you will be prompted if you want to have a primary key defined for the
table, reply “No.”
Finally, edit the table to identify the primary key. Click on the table name, then right
click and select “Design View.” When the table opens, select the CustomerIdentifier
row, right click it, then select the “Primary Key” from the pop-up list. The table can
now be saved with that change.
50 ◾ A Practical Guide to Database Design
Referring to Figure 3.2, repeat this process to create tables for all items in the dia-
gram. For each column, choose the appropriate data type.
Note that Access does not support varchar data types. If available, varchar data
types are much more efficient than a fixed-length implementation. For example,
if the string “Wilson Grocery” was stored with a data type of varchar(50), only
14 characters of storage would be needed/used. However, Short Text data types
are fixed length. If stored as Short Text with the default of 255 characters, “Wilson
Grocery” would use 255 bytes of storage. When using Short Text data types, it is
important to change the default length for that column to a lower, more reason-
able value. I have changed the length for all of my Short Text columns to 50.
After creating all of the tables shown in the physical data model, the database will
look like the following.
Database Implementation ◾ 51
Technically, the database can be used at this moment as is. However, to improve
processing efficiency and data integrity, we need to implement indexes as well as
Referential Integrity constraints.
First, we will modify the database to add the Referential Integrity constraints. Click
on the “Database Tools” tab, then choose “Relationships,” and the following list will
appear.
52 ◾ A Practical Guide to Database Design
Selecting all table names in the list and clicking Add produces this display.
I have rearranged the display of tables in my database to that shown on the next page.
Study the diagram on the next page and confirm that each of these one-to-many rela-
tionships exists based on the primary keys as defined.
• Between Customer and CreditCard
• Between Customer and Order
• Between Credit_Card and Order
• Between Order and Ordered_Item
• Between Advertised_Item and Ordered_Item
• Between Advertised_Item and Restock_Item
• Between Supplier and Restock_Item
Database Implementation ◾ 53
We are now ready to define the one-to-many relationships shown in Figure 3.2. In
this instance, it is simple; we simply ask “does the key for a given table appear within
another table?” Whenever that relationship is found, a Referential Integrity relation-
ship must be created. We therefore need Referential Integrity constraints
• Between Customer and CreditCard
• Between Customer and Order
• Between Credit_Card and Order
• Between Order and Ordered_Item
• Between Advertised_Item and Ordered_Item
• Between Advertised_Item and Restock_Item
• Between Supplier and Restock_Item
54 ◾ A Practical Guide to Database Design
Verify the table and column names selected for the one-to-many relationship, then
click the box to “Enforce Referential Integrity” as well as the two cascade options when
they appear. Clicking “Create” activates Referential Integrity between Customer and
Credit_Card as indicated by the line shown in the following.
Now repeat that process for each one-to-many relationship shown in Figure 3.2 and
compare your results to the following:
Database Implementation ◾ 55
Note at the bottom of the display that CustomerIdentifier is not indexed. Change that
option to “Yes” and close and save that view.
Now make those changes for all columns serving as foreign keys to other tables.
We also need indexes to support significant retrieval for tables using other than the
table’s key or foreign key column(s).
For example, if we found that we frequently needed to retrieve Customer information
using only the customer’s telephone number, we can create an index on that column.
As we are using a relational database management system (RDBMS), this change can
be made at any time, while the database is online or in use.
Indexes are one of the primary solutions to performance problems as they are dis-
covered in application development and testing. Remember that most RDBMSs have
monitoring and tuning tools that can help discover issues as testing progresses.
Now let us look at how to implement the physical data model using SQL Server.
* Note that the installation instructions for SQL Server are included in Chapter 7.
Database Implementation ◾ 57
From the Windows “All Programs” “Start” icon, navigate to the Microsoft SQL Server
folder created for this software installation and choose SQL Server Management
Studio. Optionally, type “sql server” in the search window and select “Sql Server
Management Studio.”
After verifying the log on information is using a Windows account that was autho-
rized as an SQL Server administrator, click “Connect” to log on.
You will be presented with a menu displaying the different types of services available.
58 ◾ A Practical Guide to Database Design
Put the cursor on “Databases,” right click and select “New Database.” You will then
be presented with the following screen showing details of installation options.
Enter the name of the new database in the “Database Name” block.
Database Implementation ◾ 59
Next, use the scroll bar at the bottom of this window to move to the right to show
where the database will be installed.
Change the default configuration details as appropriate.
Double click on “Databases” and again on the name for the database just created.
60 ◾ A Practical Guide to Database Design
• Create tables
To create a table in the database, right click on “Tables” and select “New Table” to open
a window to define columns for that new table (in this example, the Customer table).
Referring to the physical data model, enter the name of the first column in Customer
with an appropriate data type.
Database Implementation ◾ 61
Click on the next row under Column Name to enter the next column and data type.
Continue adding columns for Customer until all columns have been entered.
After all columns have been defined, modify the design to show the column(s)
that define the table key. In this case, put the cursor on the box to the left of
CustomerIdentifier, then right click and choose “Set Primary Key.”
To save this design, click on the “x” at the top-right part of Table Design window.
Clicking “Yes” brings up a display where the table name can be entered.
62 ◾ A Practical Guide to Database Design
Repeat the above-mentioned sequence to create all tables shown in the physical data
model.
You are now ready to implement Referential Integrity constraints between tables in
the database.
• Creating Referential Integrity Constraints
As was done with Microsoft Access, Referential Integrity constraints must now be
created for all dependent tables in one-to-many relationships reflected in the physical
data model.
Let us start with the relationship between Customer and Credit_Card. To create a
Referential Integrity constraint in Credit_Card:
Under Tables, select Credit_Card and then Design to alter its definition.
Database Implementation ◾ 63
Next, select the column involved with the one-to-many relationship; in this case,
it is the CustomerIdentifer column (the key of Customer) that establishes/enables
that relationship.
Next, click on the arrow at the left column of CustomerIdentifier, right-click, and
select “Relationships” from the pop-up window, which generates the following:
64 ◾ A Practical Guide to Database Design
Click on “Tables and Columns Specifications” to open a window where the col-
umn relationships are specified.
Click the box at the right side of the display containing “…” to open the following:
Database Implementation ◾ 65
Using the pull-down arrows for “Primary key table,” change the Primary key
table to “Customer.”
66 ◾ A Practical Guide to Database Design
Clicking on the first open row under Customer generates a list of all columns in
the Customer table.
Review all selections and verify the table names and columns match what is
shown in the physical design model. Click OK to create that Referential Integrity
relationship in the database.
As part of saving these changes, you will be prompted that you want these changes
to be saved.
68 ◾ A Practical Guide to Database Design
Refer to the physical design model and continue making these adjustments for all
tables having foreign key relationships.
When adding each new relationship, make sure the “Relationship name” assigned
makes logical sense; each of mine include the name of the Primary key table.
To verify your results, in SQL Server Management Studio, select “Views,” right
click, and then select “New View.” A list of all tables in OrderItem will be dis-
played. Do a “block select” to select all names displayed and then click “Add.” The
following display shows all one-to-many relationships currently defined in the
database. Rearranging those gives the display shown on the next page.
Creating a View with this complexity does not make sense and is not practical,
but the visual display serves as a visual aid to ensure all one-to-many relation-
ships have been created.
After this verification, click on the “x” at the top right of the view display to close
it and respond “No” when asked if you want to save the view.
Database Implementation ◾ 69
• Additional indexes
We now need to update our tables to create indexes where needed. First, check all
tables involved with foreign key relationships to ensure that the column(s) with the
foreign key have indexes. For example, in Credit_Card, CustomerIdentifier is a for-
eign key to Customer; therefore, we want to create an index on the CustomerIdentifier
column in Credit_Card.
To create this index, first use SQL Server Studio Manager to navigate to the Credit_
Card table and expand the Keys and Index sections as shown in the following.
Note the name of the Referential Integrity constraint created earlier. SQL Server
gives the user total flexibility for naming objects, but I recommend naming
the index based on the name used for the Referential Integrity constraint to
make it easier to cross-reference items as the design is completed. In this case,
I will use FK_Credit_Card_Customer as the name of the index that will be
created.
Next, select Index, right click and select the New Index option.
70 ◾ A Practical Guide to Database Design
Enter the name for the new index (FK_Credit_Card_Customer) and complete the
remaining items; that is, it will be NonClustered (rows in this table would not be
ordered by the CustomerIdentifier) and it will not be unique (the Customer may
have many credit cards on file).
Database Implementation ◾ 71
Referring to the physical design model, create new indexes for columns in all
tables containing foreign keys.
As a final review for our first-cut physical solution, we need to review the antici-
pated call patterns for all significant user accesses to determine if any addi-
tional indexes are needed. For example, we might want to create an index on the
Customer’s CustomerTelephoneNumber to support immediate, direct retrieval of
the Customer record using their phone number.
Once again, because we are using an RDBMS, indexes can be added or deleted as
needed to maintain the highest possible performance for the application system.
QUESTIONS
1. When creating a database table, what data type would you recommend for the vari-
ous columns with zip codes? Why?
2. When creating a database table, what data type would you recommend for the vari-
ous columns with telephone numbers?
3. If the physical data model is to be expanded to include information for individual
stores associated with each customer, describe in detail what changes need to be
made.
4. As part of the changes made for Question 3, are any changes to Referential Integrity
constraints needed? Why or why not?
5. The Supplier table in Figure 3.2 has a minimal amount of information. What col-
umns do you recommend be added?
6. Describe in detail what steps to be taken to add the new columns from Question 5
to the Supplier table. How much down time/maintenance time is required?
7. Using the physical data model in Figure 3.2, describe the sequence of tables to be
accessed to show all information regarding a customer order and all items on that
order.
8. Are any indexes needed to provide a higher level of performance for Question 7?
9. What must be changed in the physical data model to store information with customer
ratings for items purchased from each supplier?
10. Using the Physical Data Model in Figure 3.2, describe the sequence of tables to be
accessed in order to show, for an advertised item, the available suppliers for each that
have a customer rating of “4” or better.
11. If the query described in Question 10 runs too slowly, what would you do?
12. Using the physical data model in Figure 3.2, for a specific Advertised_Item, describe
the sequence of tables to find the Supplier having the lowest price for that item. Are
any indexes necessary and why?
13. A database has been created based on the physical data model in Figure 3.2, and
we find that we frequently need to find Customer records based on their telephone
number. What changes need to be made to provide a higher level of performance?
14. When using Microsoft Access, what steps are necessary to create an Index for a col-
umn in a table?
15. When using SQL Server, what steps are necessary to create an Index for a column in
a table?
Chapter 4
Normalization and
Physical Design Exercise
4.1 INTRODUCTION
The normalization process is an essential skill for business analysts participating in a data-
base design project. Without an accurate understanding and definition of what data are
needed, the design and implementation team simply cannot do their job.
As normalization will probably be new to most readers, this current chapter will help
develop that skill by designing a database for a university. After reviewing data require-
ments, a third normal form data model will be developed. Physical design issues will also
be reviewed, resulting in a physical design model. Implementation issues for these tables
will also be reviewed.
75
76 ◾ A Practical Guide to Database Design
• Information for students currently enrolled, and transcript records for each com-
pleted course.
• Finally, for each course, information on assignments for each course as well as grades
recorded as they are completed.
Take a few minutes to create an initial list of the entities applicable to the university envi-
ronment. When finished, compare your list to that shown below.
Attribute Description
School
SchoolID A unique identifier for each School
SchoolName The name of the School
SchoolDescription A description for the scope of subjects offered by the School
SchoolCampusLocation The campus address for the Dean’s office
SchoolDeanID The FacultyID for the Dean
Department
DepartmentID A unique identifier for the Department
DepartmentName The name of the Department
DepartmentHeadID The FacultyID for the Department head
DeptCampusLocation
Degree
DegreeID A unique identifier for the Degree
DegreeName The name of the Degree
Course
CourseID A unique identifier for the Course
CourseName The name of the Course
CreditHours The number of credit hours associated with the Course
CourseDescription A general description for the subjects covered in the Course
CoursePrerequisite
CourseID A unique identifier for the Course with a prerequisite
PrerequisiteCseID The CourseID for the prerequisite
CourseOffering
CourseID The course identifier associated with this class offering
SectionID The section number associated with this class offering
FacultyID The Faculty member teaching this class offering
Location The building and room number where the class meets
ClassSchedule The day(s) and time the class meets
StudentID The Students enrolled in this class offering
Faculty
FacultyID A unique identifier for a Faculty member
DepartmentID The Department to which the Faculty member is assigned
FacultyPrefix The title associated with the Faculty member
FacultyLastName The last name of the Faculty member
FacultyMiddleName The middle name of the Faculty member
FacultyFirstName The first name of the Faculty member
FacultyStreetAddr The street address of the Faculty member
FacultyCity The city associated with the Faculty member’s mailing address
FacultyState The state associated with the Faculty member’s mailing address
FacultyZipCode The ZipCode associated with the Faculty member’s address
Attribute Description
Curriculum
CurriculumID A unique identifier for the Curriculum
CurriculumName The name for the Curriculum
DegreeID The unique identifier for the Degree associated with this Curriculum
CurriculumDescription A description of the scope of subjects associated with the Curriculum
CourseID The Course numbers associated with this Curriculum
Job Classification
ClassificationID A unique identifier for this Job classification
ClassificationName The name commonly used for jobs with this classification
ClassificationDescription A description of the type of work associated with this job
CurriculumID The unique identifiers for Curriculums related to this job
Student
StudentID A unique identifier for each student
DepartmentID The Department identifier associated with a student
StudentLastName The last name of the Student
StudentMiddleName The middle name of the Student
StudentFirstName The first name of the Student
StudentTelephoneNumber The contact phone number for the Student
StudentStreetAddr The street address for this Student
StudentCity The city associated with the Student’s mailing address
StudentState The state associated with the Student’s mailing address
StudentZipCode The zip code associated with the Student’s mailing address
ClassLevel The current class level for the Student (e.g., Freshman)
Student Transcript
StudentID The Student identifier associated with this transcript grade
CourseID The Course number associated with this transcript grade
Grade The grade recorded for this Student and Course
DateCompleted The date this grade was reported
CreditHoursEarned The number of credit hours earned upon course completion
Assignment
CourseID The Course identifier associated with this Assignment
SectionID The Section number associated with these class assignments
AssignmentNo A unique sequential number assigned to each Assignment
AssignmentDescription A description of the work to be done to complete the assignment
AssignmentGrade The Student grades reported for each completed Assignment
Attribute Description
School
*SchoolID A unique identifier for each School
SchoolName The name of the School
SchoolDescription A description for the scope of subjects offered by the School
SchoolCampusLocation The campus address for the Dean’s office
SchoolDeanID The FacultyID for the Dean
Department
*DepartmentID A unique identifier for the Department
DepartmentName The name of the Department
DepartmentHeadID The FacultyID for the Department head
DeptCampusLocation The campus address for the Department
Degree
*DegreeID A unique identifier for the Degree
DegreeName The name of the Degree
Course
*CourseID A unique identifier for the Course
CourseName The name of the Course
CreditHours The number of credit hours associated with the Course
CourseDescription A general description for the subjects covered in the Course
CoursePrerequisite
*CourseID A unique identifier for the Course with a prerequisite
*PrerequisiteCseID The CourseID for the prerequisite
CourseOffering
*CourseID The course identifier associated with this class offering
*SectionID The section number associated with this class offering
FacultyID The Faculty member teaching this class offering
Location The building and room number where the class meets
ClassSchedule The day(s) and time the class meets
StudentID The Students enrolled in this class offering
Faculty
*FacultyID A unique identifier for a Faculty member
DepartmentID The Department to which the Faculty member is assigned
FacultyPrefix The title associated with the Faculty member
FacultyLastName The last name of the Faculty member
FacultyMiddleName The middle name of the Faculty member
FacultyFirstName The first name of the Faculty member
FacultyStreetAddr The street address of the Faculty member
FacultyCity The city associated with the Faculty member’s mailing address
FacultyState The state associated with the Faculty member’s mailing address
FacultyZipCode The ZipCode associated with the Faculty member’s address
Attribute Description
Curriculum
*CurriculumID A unique identifier for the Curriculum
CurriculumName The name for the Curriculum
DegreeID The unique identifier for the Degree associated with this Curriculum
CurriculumDescription A description of the scope of subjects associated with the Curriculum
CourseID The Course numbers associated with this Curriculum
Job Classification
*ClassificationID A unique identifier for this Job classification
ClassificationName The name commonly used for jobs with this classification
ClassificationDescription A description of the type of work associated with this job
CurriculumID The unique identifiers for Curriculums related to this job
Student
*StudentID A unique identifier for each student
DepartmentID The Department identifier associated with a student
StudentLastName The last name of the Student
StudentMiddleName The middle name of the Student
StudentFirstName The first name of the Student
StudentTelephoneNumber The contact phone number for the Student
StudentStreetAddr The street address for this Student
StudentCity The city associated with the Student’s mailing address
StudentState The state associated with the Student’s mailing address
StudentZipCode The zip code associated with the Student’s mailing address
ClassLevel The current class level for the Student (e.g., Freshman)
Student Transcript
*StudentID The Student identifier associated with this transcript grade
*CourseID The Course number associated with this transcript grade
Grade The grade recorded for this Student and Course
DateCompleted The date this grade was reported
CreditHoursEarned The number of credit hours earned upon course completion
Assignment
*CourseID The Course identifier associated with this Assignment
*SectionID The Section number associated with these class assignments
*AssignmentNo A unique sequential number assigned to each Assignment
AssignmentDescription A description of the work to be done to complete the assignment
AssignmentGrade The Student grades reported for each completed Assignment
Unique identifiers: In creating the information in Figure 4.1, unique identifiers were
included and were simply flagged with an “*.” Of course, for some entities, more than
one attribute must be used to identify a unique occurrence of the entity.
Repeating groups: For each attribute that has more than one value, it must be
removed and placed where it belongs. Next, we must check to see if the relationship
between those two attributes is one-to-many or many-to-many. If that relation-
ship is one-to-many, we are OK and simply continue, but if it is many-to-many, we
must create a new entity to store the intersection data (the attributes common to
both).
• StudentID in CourseOffering
Does StudentID appear where it belongs? (Yes, in Student).
Is the relationship between Student and CourseOffering one-to-many, or many-to-
many? (Many-to-many; a student may be enrolled in multiple courses, and each
course will have more than one student).
A new entity “Course Enrollment” is needed, along with attributes for a student
enrollment:
Course Enrollment
*CourseID
*StudentID
EnrollmentStatus
DateEnrolled
MidTermGrade
FinalGrade
• CourseID in Curriculum
Does CourseID appear where it belongs? (Yes, in Course).
Is the relationship between Course and Curriculum one-to-many or many-to-
many? (Many-to-many; a Course will be relevant to multiple Curriculums, and any
Curriculum will be associated with multiple courses).
A new “Course Curriculum” entity is required along with relevant attributes:
Course Curriculum
*CurriculumID
*CourseID
Opt-MandatoryFlag
CurriculumCseInfo
82 ◾ A Practical Guide to Database Design
• AssignmentGrade in Assignment
Is AssignmentGrade where it belongs? (No, we need a new entity “Student Grade.”)
Student Grade
*CourseID
*SectionID
*StudentID
*AssignmentID
AssignmentGrade
DateGraded
Attribute Description
School
*SchoolID A unique identifier for each School
SchoolName The name of the School
SchoolDescription A description for the scope of subjects offered by the School
SchoolCampusLocation The campus address for the Dean’s office
SchoolDeanID The FacultyID for the Dean
Department
*DepartmentID A unique identifier for the Department
DepartmentName The name of the Department
Attribute Description
DepartmentHeadID The FacultyID for the Department head
DeptCampusLocation
Degree
*DegreeID A unique identifier for the Degree
DegreeName The name of the Degree
Course
*CourseID A unique identifier for the Course
CourseName The name of the Course
CreditHours The number of credit hours associated with the Course
CourseDescription A general description for the subjects covered in the Course
CoursePrerequisite
*CourseID A unique identifier for the Course with a prerequisite
*PrerequisiteCseID The CourseID for the prerequisite
CourseOffering
*CourseID The course identifier associated with this class offering
*SectionID The section number associated with this class offering
FacultyID The Faculty member teaching this class offering
Location The building and room number where the class meets
ClassSchedule The day(s) and time the class meets
Faculty
*FacultyID A unique identifier for a Faculty member
DepartmentID The Department to which the Faculty member is assigned
FacultyPrefix The title associated with the Faculty member
FacultyLastName The last name of the Faculty member
FacultyMiddleName The middle name of the Faculty member
FacultyFirstName The first name of the Faculty member
FacultyStreetAddr The street address of the Faculty member
FacultyCity The city associated with the Faculty member’s mailing address
FacultyState The state associated with the Faculty member’s mailing address
FacultyZipCode The ZipCode associated with the Faculty member’s address
Curriculum
*CurriculumID A unique identifier for the Curriculum
CurriculumName The name for the Curriculum
DegreeID The unique identifier for the Degree associated with this Curriculum
CurriculumDescription A description of the scope of subjects associated with the Curriculum
Job Classification
*ClassificationID A unique identifier for this Job classification
ClassificationName The name commonly used for jobs with this classification
ClassificationDescription A description of the type of work associated with this job
Student
*StudentID A unique identifier for each student
DepartmentID The Department identifier associated with a student
StudentLastName The last name of the Student
Attribute Description
StudentMiddleName The middle name of the Student
StudentFirstName The first name of the Student
StudentTelephoneNumber The contact phone number for the Student
StudentStreetAddr The street address for this Student
StudentCity The city associated with the Student’s mailing address
StudentState The state associated with the Student’s mailing address
StudentZipCode The zip code associated with the Student’s mailing address
ClassLevel The current class level for the Student (e.g., Freshman)
Student Transcript
*StudentID The Student identifier associated with this transcript grade
*CourseID The Course number associated with this transcript grade
Grade The grade recorded for this Student and Course
DateCompleted The date this grade was reported
CreditHoursEarned The number of credit hours earned upon course completion
Assignment
*CourseID The Course identifier associated with this Assignment
*SectionID The Section number associated with these class assignments
*AssignmentNo A unique sequential number assigned to each Assignment
AssignmentDescription A description of the work to be done to complete the assignment
Course Enrollment
*CourseID The Course identifier for this enrollment record
*StudentID The Student identifier for this enrollment record
EnrollmentStatus A label giving the student’s enrollment status in this course
DateEnrolled The date the student enrolled in this course
MidTermGrade The student’s mid-term grade for this course
FinalGrade The final grade given for this student in this course
Curriculum Course
*CurriculumID The Curriculum identifier associated with this course
*CourseID The Course identifier for this course
Opt-MandatoryFlag An Optional/Mandatory flag for this Course/Curriculum relationship
CurriculumCseInfo Descriptive information on how this course is associated with this
Curriculum
Curriculum Job Classification
*CurriculumID The Curriculum identifier for this Job relationship
*ClassificationID The Classification identifier for this Curriculum relationship
Job Relationship Descriptive information about how this job title maps to this Curriculum
Student Grade
*CourseID The Course identifier for this course/student/assignment
*SectionID The Section identifier for this course/student/assignment
*StudentID The Student identifier for this course/student/assignment
*AssignmentID The Assignment number for this course/student/assignment
Assignment Grade The grade given for this student/assignment
DateGraded The date the grade was given
Now that the data model is in first normal form, let us use this information to draw the
data model and verify that it makes sense. Refer to Figure 4.3 and use this information to
draw all one-to-many relationships in the data model. Be careful and only draw arrows
when the key of one entity is embedded within another.
When you are finished, compare your results to that shown in Figure 4.4.
After drawing the logical data model, review the diagram to see if all one-to-many
relationships are shown and that they make sense. Do you see any issues in the above-
mentioned diagram?
There should be a one-to-many arrow between Department and Course, but it is not
shown above because at this moment, the key of Department (DepartmentID) does not
exist within Course.
When drawing the logical data model, do not just draw one-to-many arrows where they
intuitively belong. Draw them only where/when supported by the entity/attribute list,
check the diagram for errors/omissions, and adjust the entity/attribute list and diagram
accordingly.
Figure 4.5 shows a corrected logical data model based on the first normal form entity/
attribute list.
Now let us continue with the Normalization process.
Let us review the requirement for second normal form. As stated in Chapter 2:
For second normal form, each non-key attribute must depend on the key and all
parts of the key.
Take a minute to review Figure 4.5 and see what, if any, changes need to be made for the
data model to be in second normal form.
No changes are necessary; the data model is in second normal form.
Next, the requirement for third normal form is: “A non-key attribute must not depend
on any other non-key attribute.”
Review the data model once more to see if any changes are necessary for third normal
form.
Once again, no changes are necessary for third normal form.
School Degree
*SchoolID *DegreeID
SchoolName DegreeName
SchoolDescription DepartmentID Course Curriculum
SchoolCampusLocation *CurriculumID
SchoolDeanID *CourseID
Curriculum Opt-MandatoryFlag
*CurriculumID CurriculumCseInfo
CurriculumName
DegreeID Student
Department
CurriculumDescription *StudentID
*DepartmentID
DepartmentID
SchoolID
StudentLastName Student Transcript
DepartmentName
StudentMiddleName *StudentID
DepartmentHeadID
StudentFirstName *CourseID
DeptCampusLocation
StudentTelephoneNumber Grade
StudentStreetAddr DateCompleted
Curriculum Job Classification StudentCity CreditHoursEarned
Course *CurriculumID StudentState
*ClassificationID StudentZipcode
86 ◾ A Practical Guide to Database Design
*CourseID
CourseName Job Relationship ClassLevel
CreditHours
CourseDescription Course Offering Course Enrollment
*CourseID *CourseID
*SectionID *SectionID
FacultyID *StudentID
CoursePrerequisite Location EnrollmentStatus
*CourseID ClassSchedule DateEnrolled
*PrerequisiteCseID MidTermGrade
Faculty FinalGrade Student Grade
*FacultyID *CourseID
Job Classification DepartmentID *SectionID
Assignment
*ClassificationID FacultyPrefix *StudentID
*CourseID
ClassificationName FacultyLastName *AssignmentNo
*SectionID
ClassificationDescription FacultyMiddleName AssignmentGrade
*AssignmentNo
FacultyFirstName DateGraded
AssignmentDescription
FacultyStreetAddr
FacultyCity
FacultyState
FacultyZipCode
Student Grade
*CourseID
*SectionID
*StudentID
Assignment1
Assignment2
Assignment3
Assignment4
Assignment5
...
Assignment16
This implementation drastically reduces I/Os for inserting new grades and searching for
grades.
After making this change, the initial physical design model is given in Figure 4.6.
Note, however, the one-to-one relationship between Course Enrollment and Student
Grade. Why?
A one-to-one relationship should always raise a warning flag. If the two entities have identi-
cal keys, then why not combine all attributes into one entity?
There may be times when having two separate entities makes sense. For example, the
two entities may be populated at different points in time. Having two separate entities may
then make sense, using one to store active/current information and the second to store
older, historic data.
In this case, however, I recommend combining Course Enrollment and Student Grade,
moving all Student Grade attributes into Course Enrollment. The physical data model
reflecting this change is shown in Figure 4.7.
School Degree
*SchoolID *DegreeID
SchoolName DegreeName
SchoolDescription DepartmentID Course Curriculum
SchoolCampusLocation *CurriculumID
SchoolDeanID *CourseID
Curriculum Opt-MandatoryFlag
*CurriculumID CurriculumCseInfo
CurriculumName
DegreeID Student
Department
CurriculumDescription *StudentID
*DepartmentID
DepartmentID
SchoolID
StudentLastName Student Transcript
DepartmentName
StudentMiddleName *StudentID
DepartmentHeadID
StudentFirstName *CourseID
DeptCampusLocation
StudentTelephoneNumber Grade
StudentStreetAddr DateCompleted
Curriculum Job Classification StudentCity CreditHoursEarned
Course *CurriculumID StudentState
*CourseID *ClassificationID StudentZipcode
CourseName Job Relationship ClassLevel
CreditHours
DepartmentID
CourseDescription Course Offering Course Enrollment
*CourseID *CourseID
*SectionID *SectionID
FacultyID *StudentID
CoursePrerequisite Location EnrollmentStatus
*CourseID ClassSchedule DateEnrolled
*PrerequisiteCseID MidTermGrade
Faculty FinalGrade Student Grade
*FacultyID *CourseID
Job Classification DepartmentID *SectionID
Assignment
*ClassificationID FacultyPrefix *StudentID
*CourseID
ClassificationName FacultyLastName Assignment1
*SectionID
ClassificationDescription FacultyMiddleName Assignment2
*AssignmentNo
FacultyFirstName Assignment3
AssignmentDescription
FacultyStreetAddr Assignment4
FacultyCity ...
FacultyState Assignment16
FacultyZipCode
Normalization and Physical Design Exercise ◾ 89
When implementing the aforementioned physical data model, we would of course cre-
ate indexes for columns that are unique identifiers. We would also create indexes on attri-
butes in a table that serve as foreign keys to another table.
If our database implementation supports some sort of clustering of tables (e.g., Oracle
Tablespaces), we would want to cluster this information by grouping tables on disk based
on projected update activity and storing like data together.
The normalization process may seem overwhelming at first. The good news is that it is
really just applied common sense; entities are created for each unique object in the scope of
the design, and attributes are associated with the entity it describes. With a little practice,
normalization will become second nature, and new data models will need minimal adjust-
ments to get to third normal form.
The next chapter reviews how the erwin data modeling tool can be used to create logical
data models which, in turn, are used as input to physical data models.
QUESTIONS
1. Explain the difference between a Logical Data Model and a Physical Data Model.
2. When does a one-to-many relationship exist between two entities?
3. Can a Logical Data Model contain a many-to-many relationship? Why/why not?
4. Is a one-to-one relationship valid? Why/why not?
5. When describing a column in a table, what is the meaning of the Optional or
Mandatory setting?
6. What information is considered when moving from a Logical Data Model to a
Physical Data Model?
7. What changes would you make to the Physical Data Model to add a telephone num-
ber for faculty? Give an example. What Storage format would you use?
8. What changes would you make to the Physical Data Model to add minimal grade
requirements for prerequisite courses? Would this be an Optional or Mandatory
column?
92 ◾ A Practical Guide to Database Design
9. What changes would you make to the Physical Data Model to add email address for
students and faculty? What Storage format would you use?
10. What changes would you make to the Physical Data Model to add information on
textbooks needed for each course?
11. In Course Curriculum, what storage format would you choose for the Opt-Mandatory
column?
12. If this university had multiple campus locations, what changes would you make to
the Physical Data Model to store information about faculty positions at each campus
location?
13. If this university had multiple campus locations, what changes must be made to the
Physical Data Model to track what courses are taught at each location?
14. What changes must be made to the Physical Data Model to store information regard-
ing graduates?
15. Faculty members are optionally appointed as course developers. What changes to the
Physical Data Model are necessary to track that information?
Chapter 5
93
94 ◾ A Practical Guide to Database Design
sense. If, however, the original designers were careless in their choice of names, the result-
ing physical data model would not make sense.
For example, if the key columns for several tables were all named “SEQUENCE_
NUMBER,” the resulting display would show relationships between all of these tables and
not make any sense from the business perspective.
When this happens, the confusing/incorrect physical data model is not the fault of
erwin; rather, it is just the result of the design and naming conventions chosen when the
system was first built.
• Development environment
A development environment will be supported by a host computer and its associated
disk arrays and database. This is where development team members code and test
individual components for components being developed.
All members of the development team will carefully record and track individual
changes to software and interfaces as they are developed. DBAs apply updates using
DDL statements and carefully record details of each individual change.
When the development team has successfully completed testing a set of updates,
change packages are prepared to migrate these changes to another environment. For
DBAs on the team, their change package will consist of the DDL for all sequential
changes made to the development database.
• Test environment
A test environment will run on another host with its own set of disk arrays and
database.
The purpose of this environment is to test the accuracy and completeness of a change
package with changes being migrated. If errors or omissions are found, changes
are backed off, and the development team will research and resolve the issues found,
preparing a new change package to be applied to the test environment.
After a change package has been successfully tested, the team can now migrate those
changes to the production environment.
The erwin Data Modeling Tool ◾ 95
• Production environment
As the name suggests, this is the host environment for the real-time production
environment. It of course runs on another host with its own set of disk arrays and
database.
These systems often run 24×7, and some high-availability systems have an accept-
able downtime of a few hours per year. Migration testing of changes to a test envi-
ronment is essential for the smooth migration of a change package to the production
environment with no surprises.
DBAs have online graphical user interface tools that give them the capability to make
quick and easy changes to tables, stored procedures, and triggers. However, using this type
of tool makes it difficult to impossible to track all changes, then sequentially apply those
changes to another environment.
For this reason, DBAs apply all changes using a sequence of DDL updates, recording
each into a master set of changes to be applied when the change package is ready to be
applied to another environment.
For licensed users of erwin, my recommendation is to use it to first create the tables in
the initial physical data model (or reverse engineer a live system). After creating that initial
baseline, manually create DDL change packages for subsequent updates to the database
while keeping erwin up to date as changes are applied. Details on how to use erwin for
tracking changes are beyond the scope of the current book.
Let us now see how to use erwin to create the logical and physical models for the
University data model created in Chapter 4.
• Click OK to use the locally installed trial license to get this display.
The erwin Data Modeling Tool ◾ 97
You are now ready to use this display to add the entities and attributes for the
University logical data model.
To add new entities, select Entities/New to get the following.
The erwin Data Modeling Tool ◾ 99
• Under Entities, change “E/1” to the name of the first entity to be defined (in this case,
“School”).
• Use this window to add attributes for School. Start by selecting the Attributes/New
option to get the following:
• Change “<default>” to the next attribute being added (in this case, SchoolName).
102 ◾ A Practical Guide to Database Design
• Repeat this sequence to add the other attributes for the School entity.
• Let us now set properties for these attributes. Select SchoolID, then Properties to get
the following.
Use this screen to identify the key column for the entity (in this case, SchoolID) and
to change the data type to the data type anticipated for the physical design model.
In this case, all were changed to varchar(50), where a value of SchoolID has
• The varchar option results in a SchoolID value of only 10 bytes to require
10 bytes of physical storage. Using varchar therefore supports flexible yet efficient
implementations.
• Continue using this sequence of command to add the other entities and attributes in
the University data model.
Note that at any time, the “File/Save As” feature can be used to save the logical data model.
The following display shows all of the entities after being created in the logical data
model.
• To create the logical data model diagram, we will now add each entity to the dia-
gram on the right. To add Assignment, select it in the left pane, then select “Add to
Diagram.”
104 ◾ A Practical Guide to Database Design
• Continue adding other entities, moving each in the display and arranging them close
to associated entities.
• The next display shows the diagram with all entities; the Model Explorer window was
closed to free up space in the display.
• To add a relationship between School and Department, first select the one-to-many
icon (under the Tools label).
• Click on the key of School (SchoolID), hold the mouse button down, and drag the
arrow to SchoolID in Department.
• Check to confirm the arrow drawn connects the SchoolID attribute between School
and Department.
• Release the mouse button.
• Note, however, that the primary key for Department was changed to include the for-
eign key SchoolID. SchoolID is a foreign key but does not need to be part of the
primary key.
• To make this change, select the Department entity, right-click, and select Attribute
Properties; the following window will open.
The erwin Data Modeling Tool ◾ 107
• On the SchoolID row, simply click on the check in the Primary Key column to dese-
lect that specification.
• Closing that window corrects the attribute specifications for Department. Note the
change in the following window.
108 ◾ A Practical Guide to Database Design
This is the third-normal form logical data model for the University database.
As with any logical model, one-to-many relationships can only exist when the primary
key of an entity exists within another entity. In addition, the diagram must make sense; if
your business sense suggests a one-to-many relationship exists between two entities, the
primary key of the first should be added as a foreign key to the second entity.
The erwin Data Modeling Tool ◾ 109
• Move the mouse cursor through the Actions/Design Layers sequence and click on
Derive New Model.
• Under Target Database, use the pull-down to select the type of RDBMS to be used (in
this case, SQL Server).
• Clicking Next takes you to a screen to specify how much of the logical model is to be
used as input for the conversion.
The erwin Data Modeling Tool ◾ 111
• Clicking Next allows you to specify the objects to be linked in the new model.
• Clicking Next opens another window with options for Naming Standards.
112 ◾ A Practical Guide to Database Design
This diagram now represents tables and columns to be created in a new database. This
physical model should be saved with an appropriate name.
Before we create physical tables, let us refer to the usage analysis for this model and note
the insert activity required when new Student_Grade rows are entered, and the subsequent
read activity to retrieve Student_Grade rows for review or updating.
To provide an implementation with a higher level of performance, let us de-normalize
the Student_Grade table, removing AssignmentNo while adding a column for each week’s
grade. This design requires only one row in the table to hold all grades for each student and
eliminates the I/O activity when new grades are added. The physical design model now
looks that shown on the next page.
The erwin Data Modeling Tool ◾ 113
Remember that our third-normal form data model was deliberately kept clean and
simple to give a good foundation for the original physical design. Now, however, we
must review usage requirements, anticipate physical accesses to satisfy user require-
ments, and make changes to the physical design as needed to improve performance of
the system.
If there are no further changes based on initial performance predictions, we can now use
erwin to create DDL statement to create a database containing these columns, attributes,
and constraints.
114 ◾ A Practical Guide to Database Design
• Use the cursor to follow the Actions/Forward Engineer menus and select Schema to
get this window:
• Navigate to the directory in which you want to create the DDL file and name the file
to be created:
• Clicking Save will create a DDL file to define all tables, columns, and constraints
shown in the physical design model.
erwin has the capability to directly connect to an Oracle or SQL Server instance to create
and/or manage databases, but that is beyond the scope of the current book.
To see the SQL code generated by erwin to create the schema from this physical model,
refer to Appendix C—University DDL.txt.
QUESTIONS
1. What is a data modeling tool? How does it help a design team when designing a new
database?
2. What is the difference between a logical data model and a physical data model?
3. Name two RDBMS products supported by erwin’s physical data modeling software.
4. What is reverse engineering, and when/how is it used?
5. What does the erwin reverse engineering file use as input, and what does it produce?
6. How accurate are the results of a reverse engineering process?
7. What is the change management process, and why is it important?
116 ◾ A Practical Guide to Database Design
8. Describe the components of a change management system and describe what they do.
9. When using erwin for logical data modeling, where is the information entered that
gives a full definition/description for each attribute?
10. When defining attributes in a logical data model, what is the purpose of defining/
selecting logical data types?
11. When using erwin for logical data modeling and defining a new entity and its attri-
butes, how is the key column identified?
12. After creating tables in a logical data model, how are one-to-many relationships
identified?
13. After creating a logical data model, what needs to be done to create a matching
physical data model?
14. When creating a physical data model, when/how are referential integrity constraints
created?
15. When using erwin for physical data modeling, what needs to be done to create the
DDL to create a new database?
REFERENCE
1. Erwin tutorial, learndatamodeling.com, https://learndatamodeling.com/blog/erwin-
tutorial/ (accessed August 24, 2017).
Chapter 6
6.1 OVERVIEW
Microsoft Access is a relational database management system (RDBMS) that is included as
a component in the Professional and Enterprise versions of Microsoft Office. Although it
lacks the depth of RDBMS features provided by SQL Server or Oracle, it has a number of
features which make it ideal for developing applications for many desktop users.
• Users can design tables and views and can import data into tables.
• It includes tools to help the user to quickly create queries and view and update data.
• Microsoft Access includes a graphical user interface (GUI) allowing the user to
quickly create reasonably complex interfaces.
• As it is embedded in higher end Microsoft Office packages, it is available for use by all
users in the Department of Defense and Intelligence communities.
• Microsoft Access supports a somewhat limited version of SQL. However, a complex
query can often be accomplished by combining multiple smaller, simpler views to
create the more complex result.
Access can be used to develop and deploy database management system (DBMS) tools for
a team of users. One database can be designed as the team GUI containing the query and
update mechanisms needed. This database would contain a link to another Access data-
base that contains the table(s) being viewed/updated. Deploying the Access GUI to each
user’s computer would allow each user to view and/or update the single database instance
with the data shared by the user community.
117
118 ◾ A Practical Guide to Database Design
• Tables in Access can be defined as links to tables that reside in another RDBMS.
Mechanisms created in Access can then be created to view and update data, for example,
from an SQL Server table.
Let us begin by using the OrderItem database created in Chapter 3.
• In the Customer table, the DataType for the CustomerIdentifier was changed from
AutoNumber (used in that chapter to illustrate that feature) to ShortText with a
length of 50 bytes. This format is more realistic and flexible for that column and pres-
ents fewer issues when generating Forms and Reports.
• In the Advertised_Item table, the ItemWeight and ItemColor columns were removed.
With those changes, the OrderItem database now contains the following structure:
• For quick development and testing, Access allows the user to open a table directly
and enter new rows, or to modify data already loaded.
Using Microsoft Access ◾ 119
Note that when loading tables involved with one-to-many relationships, tables on
the primary (one) side of the relationship must be added before rows on the many
side. For example, a Customer record must be created before any Credit_Card
rows with that CustomerIdentifier can be added.
• Tables are often loaded by importing data from an external file. To load Customer
data, select Customer, then use the “External Data” tab to select the source type.
The most common sources for table imports are Excel files. Choosing the Excel file
opens a window allowing you to navigate to and select the file to be imported.
The import mechanism will attempt to match the table design to the file content and
gives the user the option to skip/ignore columns in the file being imported.
As an option, the user may import the file as is into a new table using column
names found on the first line of the input file. After importing the file, the user
can then run an SQL command using the “INSERT INTO <table1> VALUES
(<column names>) SELECT <column names> FROM <table2>” syntax to take
rows from the table just created and add them to the table to be updated.
The import wizard also makes it easy to import data from delimited flat files, where
columns of information are separated by special characters (e.g., tab characters).
• By using the pull-down menu for Tables, select the primary table of interest (in this
case, Customer).
• Individual column names can be selected for use by selecting the column name and
clicking “>”; clicking “>>” will add all columns to the query being build.
122 ◾ A Practical Guide to Database Design
• Clicking Next brings up a screen allowing you to see the results of the query being
developed, or you can choose the Modify option to continue developing the query.
• To add the Credit_Card table to the query, click in the design pane, then right-click
and select the “Show Table” option to bring up the following.
• Select the table to be added to the query design (Credit_Card) and click Add. The
“Show Table” menu will remain open to add other tables; click “Close” to close that
panel.
• The query now will show all Customers and their associated Credit_Card rows. Click
the View option to see the data returned by the query as it stands.
124 ◾ A Practical Guide to Database Design
Optionally, you can also select “SQL View” to see the SQL statement that will be used
to run the query.
Using Microsoft Access ◾ 125
Delete or rearrange the order of columns if desired; when finished, clicking x on the
Customer Query line will close the query, giving you the option to rename it.
• Use the Query wizard to start building a query on the Restock_Item table, and when
the panel opens, switch to SQL View. Next, replace the code displayed with the fol-
lowing SQL command:
• At the top of the pane, select the “Update” button, then close the query and give it an
appropriate name.
Let us see how a query can be created and used to view and filter prices from Suppliers
for an Advertised_Item.
• By using the Query wizard, start building a query for the Advertised_Item table.
• Next, click in the design pane and add the Restock_Item and Supplier tables.
Using Microsoft Access ◾ 127
• Update the query to display the Supplier name and the ReorderPrice.
• To see prices for one item (e.g., ItemNumber Office1111), click on the pull-down for
ItemNumber to see a list of all values for that column; deselect all values and select
only Office1111.
• Alternatively, select the Text Filters/Equals sequence:
• Entering Office1111 filters the output display to show the prices offered by the four
Suppliers in the database.
• Right-clicking on this row opens a pop-up menu containing the “Delete Record”
option; click that option to delete the row selected.
• Create a table that will be used to hold the key for the Advertised_Item of interest.
I have named this table beginning with the word “Selected” to set it apart from a
typical application data tables (“Selected Advertised_Item”). This table will have one
column with the same storage format as Advertised_Item’s ItemNumber.
• Next, we will create a form that can look up ItemNumber values from Advertised_
Item and store the selected value in this table.
• After that value is stored, we will then open another form to display and edit the col-
umns in the selected Advertised_Item row.
130 ◾ A Practical Guide to Database Design
Let us begin by creating a form to set the value in the “Selected Advertised_Item” table.
• Click on the “Selected Advertised_Item” table name, then click the Create/Form icon.
• At the top left, click on View and Design View to switch to a design view of this form.
• Next, choose the Combo box icon from the Design tab.
• Click on the location where ItemNumber originally was located starts a wizard for
the drop-down function for the Combo box.
Using Microsoft Access ◾ 133
• Click Next to see a list of data sources for the lookup. Choose the Advertised_Item
table.
• Next, move the column to be retrieved (ItemNumber) to the right pane by clicking
“>.”
• Clicking Next allows you to store the data retrieved with a name.
• Rearrange and resize the placement of the element as desired. Leave enough space
above Form Footer to add a “Save” button later.
• Save the form with the current design, then click the View/Form View option to see
how it works.
Using Microsoft Access ◾ 137
• Selecting a value and closing the form will store that value in the “Selected Advertised_
Item” table.
• Next, create a query linking this table (once populated) to Advertised_Item and save
it with an appropriate name (in this case, “Update Advertised_Item” as that is the
intended use of the query and associated form).
138 ◾ A Practical Guide to Database Design
• After saving the query, select it as a source in the left pane, then click on Create/Form
Wizard.
• Click View/Design View to edit the layout or size of the data elements displayed.
Expand the display over the Form Footer label to leave room for a button/command
to be added.
Using Microsoft Access ◾ 139
Note that this form allows for all Advertised_Item columns to be edited. However, the
ItemNumber column is the key field for this table and should not be changed. If the item
advertised changes significantly, a new Advertised_Item should be created and an older,
now obsolete item should be deleted.
To prevent alteration of the ItemNumber column:
• Put the form in Design mode, right-click ItemNumber, and click Properties. Scroll
down the “All” tab until the Locked entry is found.
140 ◾ A Practical Guide to Database Design
• Next, select the “Button” box and click in the Design panel slightly above Form
Footer. Immediately cancel the wizard that opens and save the change giving the box
the name “Select.”
• Next, click on the newly created “Select” box, right-click on the surrounding orange
perimeter, and select “Build Event.”
Using Microsoft Access ◾ 141
• Enter commands to first close this form (and thereby saving the ItemNumber selected),
then to open the “Update Advertised Item” form to view/edit that information.
DoCmd.Close acForm, "Select Advertised_Item", acSaveYes
DoCmd.OpenForm "Update Advertised_Item"
• Now repeat this process with the “Update Advertised Item” form to add a button clos-
ing this form (after edits have been made) and to open the “Select Advertised_Item”
form to perform more edits.
6.5.2 Create a Form to Add a New Customer
Creating a Form to add a new Customer row is fairly simple but once again involves the use
of several Access objects to capture the data and add it to the Customer table. The following
steps create the objects needed for the Form.
• First, create a work table (“New Customer”) to hold/capture data elements for the new
Customer.
• Open the Customer table in Design mode and do a “File/Save As” to create a table
named “New Customer.”
• Next, create a query to add the row from “New Customer” into “Customer.”
• Use the Query wizard to create a new query, switch to SQL View, and enter the fol-
lowing command:
INSERT INTO Customer
SELECT *
FROM [New Customer];
142 ◾ A Practical Guide to Database Design
After adding a Customer, we will also need to add a row to the Credit_Card table to record
the credit card number included in the Customer table.
• Use the Query wizard to create a query, switch to SQL View mode, and enter the fol-
lowing command.
DELETE *
FROM [New Customer];
Using Microsoft Access ◾ 143
• Save this query as “Delete New Customer Info” and save it as a Delete query.
• Put the form in Design mode and adjust the length of the data elements. In addition,
increase the space above the Form Footer to allow space for a button.
• Click on Design/Button, then click in the blank space above Form Footer where you
want to add the button to open the Create Button Wizard.
Using Microsoft Access ◾ 145
• Select “Cancel” to stop the wizard, creating a black button on the form.
• Next, put the cursor on the yellow box around the button, right-click, and select
“Build Event.”
• Pick “Code Builder” and click OK. When the window opens, add commands to
• Close the form (saving data that was entered).
• Turn automatic warning off (pending update queries about to the run).
• Execute the query to insert the new Customer using the “New Customer” table.
• Execute the query to add the customer’s credit card.
• Execute the query to delete information in the “New Customer” table.
• Turn warnings back on.
146 ◾ A Practical Guide to Database Design
• Save the form with an appropriate name; it is now ready for use.
• Identify the Order number involved. Preferably, this can be selected by a drop-down
list rather than entering the Order number manually.
• Use a Query to find the Customer information and all items on the order.
• Generate a customized Report using that query to display the columns of informa-
tion desired.
Let us start by creating a table to hold the value of the order number of interest.
• Open the Order number in design mode and note the format for the OrderNumber
column; in this case, it is a text column 50 bytes long.
Using Microsoft Access ◾ 147
• You can do a “Save As” to create a “Selected Order” table and then remove all col-
umns but OrderNumber, or close the Order table and use the Create Table wizard
to create a new “Selected Order” table having one column, OrderNumber, with a
data format of 50 text bytes.
• The query will include “Selected Order” (to link to Order), Order, Customer (to get
the Customer name), Ordered_Item (to get the quantity ordered), and Advertised_
Item (to get the item name). Study the following diagram and note the columns from
each table that are selected to be displayed.
• Save the query with an appropriate name; here, the name “Order Items” was used.
We need a way to specify the OrderNumber of interest; we will do that with a Form.
• Select the “Selected Order” table name, then click on Create/Form to open the Form
wizard.
• Save the Form wizard and reopen it in Design mode.
• Select and delete the OrderNumber row.
• Go to the Design tab and select the Combo box icon, then click in the form where
OrderNumber appeared.
148 ◾ A Practical Guide to Database Design
• Use the wizard to select the Order table, then OrderNumber to retrieve those values,
storing them in this location in the Form.
• Next, add a Button with the name “Select” at the bottom of the form. Cancel the
Button wizard when it opens; we will update the button later with actions to be taken
after the OrderNumber is selected.
• Save the Form with an appropriate name (e.g., “Identify Order Number”).
• Select the Order Items query, then click on the Create/Report to open the Report
wizard. Note that the display opens showing all columns in one line.
We need to modify this output to add a drop-down box to save a value in “Selected
Order” and rearrange the column display into something more readable.
• Save the Report, then reopen it in Design Mode.
Using Microsoft Access ◾ 149
• Note that the report has column names in the Header section, with all column values
in the Detail portion.
We want to rearrange the format to display order-specific information (the Order
number and Customer information) in the Header, leaving in the Detail portion
those data elements with more than one row (ItemDescripton and QuantityOrdered).
• With the Report in Design mode, carefully move the cursor over the portion for the
Page Header until the “+”-like symbol appears. Click the mouse and drag the symbol
down to enlarge the Header portion.
• In the Detail portion, pick an item to be moved to the Header. Click on it, copy that
item, then move the mouse and click in the Header, and do a paste to add the item to
that section. Finally, delete the item from the Detail section.
The following display shows OrderNumber after being moved.
150 ◾ A Practical Guide to Database Design
• Repeat the copy/paste operation on the Customer elements to move then all to the
Header. Resize each as appropriate.
In addition, delete the ItemDescription and QuantityOrdered labels from the Header.
• Finally, select the Detail section and enlarge it slightly. Then move the ItemDescription
and QuantityOrdered elements to the left and resize them accordingly.
Note that the Header now includes information about the order as a whole, and the
Detail section contains information about the individual items on the order. “View”
the report to check the display of the items it contains, and Save and close the report
when you are finished.
We now have to make one final modification to tie all of this together.
• Go back to the Form just created (“Identify Order Number”) and put it in “Design
View” mode.
• Select the “Save” button added at the bottom, right-click on the orange box, and select
“Build Event.”
Using Microsoft Access ◾ 151
• When this Form is initially opened, we will use the drop-down menu from the
Combo box to select an OrderNumber. After that number is selected, we just need
to close this form (and thereby update the Query results) and open the Report to see
the results.
To perform these functions, enter the following commands in the code section of the
above-mentioned window:
The following screens give an example of how the Query/Form/Report elements work
together.
Alternatively, after the “Selected Order Number” table has been updated, the Report can
be opened manually.
Using Microsoft Access ◾ 153
Hopefully, you now see how Microsoft Access can be used to create a reasonably sophisti-
cated GUI for users. It just takes a bit of thought and analysis to identify and design all of
the individual components and queries required.
• Access does not support complex queries, but often several small, simple queries can
be created that together produce the desired result.
• When building multiple tools to achieve a result, naming conventions can help iden-
tify associated items in Queries, Forms, and Reports and make the overall system
more understandable and easier to maintain.
• The first database would host (contain) only the Table objects (the core data tables
shared by the user community).
• The second database would serve as a team GUI and contain all of the objects and
code used to access or analyze the data.
• The Tables section of the GUI database would initially be empty when initially cre-
ated. The External Data/New Data Source icons would be selected.
154 ◾ A Practical Guide to Database Design
• Select the “Link” option as shown and click OK. Next, browse the network shared
by users to find and select the data portion of the database. Select all tables in that
database.
• Linked tables will now appear in the Tables section of the GUI database.
• That GUI database can now be given to each user who needs access to the database.
Using Microsoft Access ◾ 155
• The wizard needs a DSN name to identify the type of ODBC driver needed to access
the data.
• If one has not already been created, clicking “New” presents a pull-down list of
DBMSs that are supported. “ODBC for Oracle” and “SQL Server” are both on the list.
• Select the appropriate ODBC driver and navigate on the user network to the location
where the database is installed. You will be asked to log on to that database.
• When finished, the tables selected will appear as linked objects in the Table section
of the GUI database.
As a result, the GUI database now provides all of the Windows-based functions for users
with the data component running in SQL Server or Oracle.
• Microsoft Access is deployed as part of the desktop environment for all users in
Department of Defense and in the Intelligence community. It therefore is a common,
free software environment for use by analytical teams.
• The linked table and Pass-Through options for queries support an environment where
complex queries can be run against industrial strength database platforms such as
SQL Server and Oracle.
156 ◾ A Practical Guide to Database Design
• Users often have difficulty describing what functionality is needed in a new system.
A design team can rapidly design and develop both tables and user interface mecha-
nisms in a prototype that users can see and react to, in order to determine core user
requirements.
QUESTIONS
1. What software product(s) are required to install and use Microsoft Access? Is a
Microsoft Access database a logical or physical database?
2. Where is a Microsoft Access database stored?
3. How do you assign the DataType for a column in a table?
4. When creating a table, how are key column(s) identified?
5. When defining a column in a table, how can an index be created on that column?
6. When defining an index on a column, does that index have to be unique?
7. Can a table be created that does not have a unique key?
8. When creating a table whose key column is an ever-increasing numeric value, what
DataType should be used for that column?
9. What steps are taken to import data into a table?
10. What steps must be taken to create a Query for three linked tables in the database?
11. Can columns in a table be updated using a Query?
12. What steps are followed to create an Update query?
13. How can you use Microsoft Access to view data in a table in an SQL Server database?
14. How can you run a query against a table in an SQL Server database using SQL Server-
specific SQL?
15. How can a table be manually browsed or filtered?
Chapter 7
7.1 OVERVIEW
SQL Server is a relational database management system (RDBMS) developed by Microsoft
and runs on Windows platforms ranging from a laptop to a dedicated database server.
It comes with a graphical user interface (GUI—the SQL Server Management Studio) that
makes it easy to create and manage individual databases running on that platform. Although
primarily aimed at small-to-mid-range applications, it is suitable for large, complex Internet
applications with multiple concurrent users.
7.1.1 Advantages
• SQL Server is relatively inexpensive and is easy to install. Trial versions are available
to download and install, as well as a free entry-level “Express” version.
• The SQL Server Management Studio provides a central tool to define tables, indexes,
views, and referential integrity constraints within a database.
• SQL Server handles complex queries and supports functions and stored procedures.
• SQL Server does not require the care and feeding from database administrators
(DBAs) typically required with Oracle.
• SQL Server supports full-text search against character-based columns, allowing the
user to search for a word embedded in a column.
• SQL Server also supports the ability to replicate a database on a remote host and sup-
ports failover to that host should the primary host fails.
157
158 ◾ A Practical Guide to Database Design
developed and deployed to minimize downtime for the upgrade and to avoid surprises in
the upgrade process.
• Changes often include table modifications as well as to code (stored procedures, func-
tions, views, or to the web interface).
• Individual changes using the Management Studio must be carefully documented to
capture the sequence and details involved with each change.
• To properly develop and test design changes, you need additional computer platforms
to host development and test versions of the database.
• The development platform is used to test individual changes as they take place.
Later, all planned changes are combined together in a change package to upgrade
the application from one level of functionality to the next.
• The test platform is where the change package is applied and tested. If the change
package is applied successfully, it is ready for use in upgrading the production
platform with minimal downtime for production users. If errors are found on the
test platform, the change package must be corrected and retested.
• After successfully testing a change package, that package is ready for use in updat-
ing the production platform.
In the current chapter, we will review key factors in the creation and use of an SQL Server
database.
I recommend using SQL Server accounts to control all database access. This gives
DBAs total control of accounts and how they are used.
• Installing SQL Server on a laptop: As mentioned earlier, SQL Server can be installed
on a laptop or desktop. Although these platforms have limitations in terms of perfor-
mance and protection against hardware failures, they provide a database platform at
minimal hardware expense.
After installation, periodic database backups must be taken. If possible, they should
be stored on a different drive from the database itself to protect against device failure.
• If only one drive is available, consider using a zip drive to store a backup of
the database.
• If two drives are available, install the database and all its files on one drive and the
log files and periodic backups on the second.
• Installing SQL Server on a server: If the server is configured with multiple drives, we
want to separate as much as possible the RDBMS system software, data files, indexes,
database log files, and files created from database backups.
Fortunately, systems today are more commonly built using RAID technology.
• RAID 5 configurations using a combination of striping (for performance) and
parity checking (for recoverability), and protect against single-device failure. As
data are striped, performance is not as much of an issue, and device separation
for files is not important; the system will continue to run after failure of one drive.
• RAID 10 configurations combine disk mirroring and disk striping and support a
higher level of performance while providing against single device failures. They
require twice as many drives as a non-RAID installation but avoid the overhead
required for the RAID 5 parity when updates are made.
When using RAID arrays, all files for the RDBMS and its databases can be placed on
the same RAID array.
• Prerequisites: As you might guess, SQL Server has specific hardware and software
prerequisites for each version. These can be found by using the Internet to request a
download, then reading the “System Requirements” section from the download page.
The next section describes the steps necessary to install a trial version of the SQL
Server 2008 R2 “Enterprise Edition” on a laptop. After downloading and extracting
the software, the installer displays a number of options in the left pane; the plan-
ning options are shown on the next page. Hardware and Software Requirements will
use the Internet as a resource to download and display this information. Clicking
on “System Configuration Checker” will check the computer being used to see if all
requirements have been met. If the computer is connected to the Internet and some
software is needed, in most cases, that software will automatically be downloaded
and installed.
160 ◾ A Practical Guide to Database Design
Clicking on the first link to install a “New SQL Server stand-alone installation…” starts the
installation wizard, showing features available.
The following options were chosen to install a relatively simple database without the
Full-Text Search or Replication features.
Selecting Next gives the user the option of changing the name of the SQL Server Instance
being installed; I have chosen the default option of MSSQLSERVER.
Note that for security reasons, the user may choose to change the default name to
make it more difficult for a hacker/intruder to identify any database instance running
on the host.
162 ◾ A Practical Guide to Database Design
Click OK to display a screen to enter a product key, if available. Click Next to install a trial
version.
164 ◾ A Practical Guide to Database Design
Accept the terms and click Next to advance to the next screen.
Using SQL Server ◾ 165
Click Install to begin the installation process. The next screen displays the results of a
system check before installing the software.
Click Next to see a display asking for the names of accounts used to manage SQL Server.
Click the button to “Use the same account for all SQL Server services.”
I have chosen the option shown in the following.
166 ◾ A Practical Guide to Database Design
Clicking Next now displays options for accounts to be used. I recommend “Mixed Mode,”
preferring to control all data access using SQL Server account names for users.
Note also a section to enter the password for the SQL Server default Administrator
account “sa.”
Using SQL Server ◾ 167
SQLSvrLogs: The SQLSvrLogs database has two tables storing information about
warning messages and error messages from an SQL Server instance. Here is how this
database is updated and used:
• Chapter 8 contains a Perl script designed to run continuously to extract warning
and error messages from an SQL Server log file. It also describes how the bcp util-
ity is used to load the contents of these files into tables.
• Chapter 11 describes how to use PHP to develop a web-based interface allowing
a DBA or administrator to review warning and error messages. When reviewing
new messages, status flags are set appropriately to keep track of records that have
been reviewed, those that are being researched to see if corrective action is neces-
sary, and those where the underlying problem/issue has been resolved.
Section 7.3.1 shows how to create a database with the WarningRecords and ErrorRecords
tables. Although very simple in nature, these tables are interesting in several respects.
• Each table has one long varchar column storing the information found in the
log record. Nothing in the log format is suitable for use as a unique identifier. As
such, the “RecordKey” column was defined with an “int” data format to support
ever-increasing numeric values automatically incremented as rows are inserted
into the table.
• Users will use the web interface to review all Warning and Error log records and
set flags for each row to label/classify the row as follows:
– Reviewed (and can be ignored).
– Pending (need to be researched further to determine if any corrective action
is needed).
– Resolved (and can subsequently be ignored from future reviews).
• The table design must include columns for each of these status flags using a “bit”
data format. In addition, as part of the process when loading new rows, a Stored
Procedure is needed to reset the null values for new rows to “0” (off).
The University Database: Chapter 4 describes the design of a University database
to store information about courses offered, students enrolled, and student grades.
Section 7.3.2 describes how to create a database in SQL Server to manage this infor-
mation based on the physical data model shown in Figure 4.7.
Using SQL Server ◾ 171
• In practice, new records for each table would be obtained by running a PERL script
24 × 7 modeled after the one shown in Section 8.5. New files are moved to the input
directory referenced in the next paragraph.
• A script is needed on the database host to continually monitor an input directory
for new files. When new files are found, they would be imported into the respective
table, and a Stored Procedure run to initialize the Reviewed/Pending/Resolved flags
described earlier.
Here are the steps required to create an SQLSvrLogs database with these two tables.
• Start SQL Server Management Studio, select Databases, right-click and chose New
Database.
Use this window to define the following columns for the WarningRecords table:
Note the “Allow Nulls” settings; this permits new rows to be added to the table with no
(null) values for these columns.
174 ◾ A Practical Guide to Database Design
After defining each column, click the x at the top right of the table definition window to
close the window. You will be asked if you want to Save the table; choose Yes, then enter a
name for the table being saved (WarningRecords).
Repeat this sequence to define the ErrorRecords table.
For this application, the web interface executes queries directly against each table. There are
no subordinate/linked tables; therefore, no Referential Integrity constraints are required.
The only additional item needed is the Stored Procedure used to initialize the three sta-
tus flags after new rows are added to the two tables.
It will identify all newly imported rows (having status flags of “null”), and reset each to
zero. Section 7.10 shows how to create this Stored Procedure.
• In general, most columns should be defined using a Data Type of varchar(50). This
results in using a variable number of bytes per column to store a character string up
to 50 bytes long.
• Where appropriate, pick an appropriate Data Type of Integer or Date/Time.
Using SQL Server ◾ 175
• Click the “Allow Nulls” box for columns where information may not be available
when the tables are initially loaded.
For example, here are the column definitions for the School table.
• After entering all column names for a table, identify the column(s) used as a key for
the table. If the key is only one column, click on the line with that column, right-click,
and select “Set Primary Key.”
After all columns and the primary key are defined, select the x at the top right of the
design window to close and save the definition, entering a table name when prompted.
176 ◾ A Practical Guide to Database Design
If the table requires more than one column for the key (e.g., the Assignment table), select the
first column in the key, hold the shift key down, and select the last column. In Assignment,
the first three columns are used in combination to identify a unique assignment.
After doing the group select shown earlier, right-click, and select the “Set Primary
Key” option.
Finally, click on the x at the top right of the design window to close the table design;
enter the table name when prompted.
Using SQL Server ◾ 177
Continue adding table definitions for all tables shown in Figure 4.7 and compare your
results to the following.
• As with Microsoft Access, indexes are automatically created for primary keys for
tables.
• Indexes are also needed to support frequently used access paths anticipated by the
design team. As one example, school administrators frequently need to see all stu-
dents enrolled in a specific department.
• If a query were executed specifying all students for a specific department and
the Student table had no index on the DepartmentID column, the Student table
would be scanned and only rows matching the DepartmentID qualifier would be
returned. This would require a significant amount of I/O activity.
• If the same query were executed after an index was created on the DepartmentID
column in Student, then the RDBMS would use the index to retrieve only stu-
dents within the specified department, requiring significantly fewer I/Os.
To create an index on the DepartmentID column in Student, double-click on the
Student table to show all options, click on Indexes, right-click, and select New
Index from the pop-up menu. This opens the following window.
178 ◾ A Practical Guide to Database Design
• In addition, administrators also frequently want to see all courses taught by a par-
ticular Faculty member. This will require an index be built on the FacultyID column
in Course Offering.
Follow the above-mentioned steps to modify the Course Offering table.
• Select the table on the many side of the relationship (Department) and put it in Design
mode.
• Select “Tables and Columns Specifications,” then click on the “…” button.
• In the next window, change the table name and columns within each to reflect the
tables and columns associated with the one-to-many relationship.
• Take a minute to review the table and columns identified for this relationship. If they
are correct, click OK.
Using SQL Server ◾ 183
• Use the pull downs for the Update and Delete rules and change both to Cascade. This
tells the RDBMS to enforce the RI constraint between School and Department.
• Click Close to end the RI constraint specifications. Next, click the top right x on the
Design window, and click Yes on the window to save these changes.
184 ◾ A Practical Guide to Database Design
• Repeat this sequence to create RI constraints for each of the remaining one-to-many
relationships.
At this point, the database is ready for use. As a first step, user accounts will be created.
To create one of these roles, use Management Studio to select the University database,
click on Security and then Roles. Next, select Database Roles, right-click, and select New
Database Role.
• Give the role a name (e.g., Developer), enter “dbo” for owner, then select the appropri-
ate entry from the pull-down items. See the following to show read, update, and Data
Definition Language authority for the Developer role.
186 ◾ A Practical Guide to Database Design
For more precise permission control, use Transact-SQL (the command line interface) to
define a schema and associate role to manage read and write permission at the Table level.
• In classified environments, SQL Server DBAs are not allowed to have system admin-
istrator accounts and privileges as a security measure. They therefore have no control
over creation of Windows accounts.
• Having all user accounts in a single location makes them easier to manage.
• To create an account for any/all databases running on that instance, select the primary
Security tab, click on Logins, and right-click on the New Login pop up.
Using SQL Server ◾ 187
• Fill in the account name for the user and enter a password. Note the checks to force
the user to change the password on the first logon and to enforce password policy
(strength of the password and periodic changes).
188 ◾ A Practical Guide to Database Design
• Click the Default database arrow and select the default database for this user.
• Use the pull down to change the Default database to University. Next, click the User
Mapping label.
190 ◾ A Practical Guide to Database Design
• Click the box for University, then at the bottom select Read, Update, or Developer to
assign that role to this user.
Remember to always use the primary Security tab to add users, then modify that user
account to map it to the appropriate database and assign permissions as appropriate.
7.6 BACKUP/RECOVERY
Database Backup/Recovery services give the user the capability to restore a database to a
point before a database failure occurred.
To take a copy of the database, first select the database in question.
Using SQL Server ◾ 191
• Review the settings. Note that the Backup type gives options for Full, Differential,
and Transaction Log.
If desired, change the backup location to a different drive or directory. Click OK to
take the backup file.
• Also note the Options tab.
192 ◾ A Practical Guide to Database Design
• Note by default, the output file is appended when a backup is taken. The Append pro-
vides more options for recovery in the future to deal with any user errors or applica-
tion updates, but of course results in more disk space being used.
• Click OK to create the backup file.
• To create a job to automatically take a backup on a scheduled basis, repeat the above-
mentioned steps but click Scripts in the following window.
Using SQL Server ◾ 193
• From this window select Steps, then click Edit to see the details of the job being
created.
194 ◾ A Practical Guide to Database Design
A job has now been created to automatically run the job per the settings recorded.
See the Jobs entry in SQL Server Agent to monitor the status of the automated
backup job.
Using SQL Server ◾ 195
• Next, right-click and choose Tasks/Import data to start the import wizard.
• Click Next to move to the window where the source and destination are designated.
Click the pull down on Data source to select the origin of the input. In the following,
a flat file has been selected.
196 ◾ A Practical Guide to Database Design
• Note that the Destination is populated using the name of the input file. Click
the Destination column, then use the pull-down list to select the table being
updated.
Using SQL Server ◾ 199
• Select Edit Mappings to verity the mapping on input and output column operations;
change when appropriate. Click the Enable identity, insert if the target table has an
ever-increasing integer value as its key column.
• Click OK, then select Preview to see anticipated results of the import.
200 ◾ A Practical Guide to Database Design
Note the option to save the details of the import operation to repeat the process in the
future.
Using SQL Server ◾ 201
The Import/Export wizard can also be used to move table data between two different
databases.
When loading data into tables with Referential Integrity constraints, the table on the
one side of a relationship must be updated before the table on the many side.
202 ◾ A Practical Guide to Database Design
• Right-click on View and select “New view” from the pull down; the following win-
dow opens showing all tables available for the view.
Using SQL Server ◾ 203
• For each table needed, select the table name then click Add. For this query, we will
need information from the Faculty, Course Offering, and Course tables. The follow-
ing display shows the design pane after adding these tables.
• Next, click on the column names to be displayed. For Faculty, we want the instruc-
tor’s ID and name; for Course Offering, the ClassSchedule showing when it meets;
and for Course, the Course ID and name.
204 ◾ A Practical Guide to Database Design
• Click the x at the top right of the design window to close the view, click Yes to save it,
typing in a name on the next window.
To run the query, open the Views tab in the University database and double-click the
view to be used to run the query.
• To manually see rows in the Course Enrollment table, expand the Tables option
in the University database, right-click Course Enrollment, and click on “Select top
1000 rows.”
Using SQL Server ◾ 205
• To further qualify the query, scroll down the top-right pane to see all of the SQL code
for the query.
206 ◾ A Practical Guide to Database Design
• To see only student for CIS130, add “where CourseID = ‘CIS130’” to the SQL code,
then click the “! Execute” button at the top of the window.
To do simple edits, select the table name, right-click, and select “Edit top 200 rows.”
Using SQL Server ◾ 207
All of the columns in the display are editable. Change individual entries as needed, and
click the x at the top right of the pane to close the table.
For more complex edits, create a view of the target table selecting columns of interest
and qualifying the view if necessary to select rows of interest. Save the view, then select it
by name, right-click, and select “Edit top 200 rows.”
A template opens to enter the code for the Stored Procedure. Note the green blocks for
comments and blue lines for SQL statements.
SQL code to initialize the tracking flags appears in the next window. Note the where
qualifiers which limit the updates to only new rows which have null values by default.
After adding the SQL code, click the “! Execute” button to run the code and create the
Stored Procedure.
To make changes to a Stored Procedure, use the database/Programmability/Stored
Procedure sequence to show a list of all Stored Procedures, select the one to be updated,
right-click, and choose Modify.
A window will open showing the original code in a “Modify Procedure” block. Make
changes as needed, then click “! Execute” to run the code to modify the logic.
• Enter the job name and select a Category, then select Steps to add details of specific
actions to take.
Note the buttons to move the sequence of tasks within a job. There are also options to
specify the action to take when the step completes successfully or if it fails.
• If connected to the Internet, run maintenance checks and send alerts to DBAs when
appropriate.
• Rebuild indexes to conserve on space.
• As an example, at one point in time I had the responsibility of monitoring an SQL
Server instance at a remote location. To verify SQL Server was continually running,
I created a task that periodically did an export of a small table to a file, which in turn
was ftp’ed to a server used to monitor the status of system components. The date/time
of the exported file was used to verify SQL Server was running at that day and time.
The SQL Server Agent has a manual interface to Error Logs to view the current log or older
logs that have archived.
Note also that the system services for SQL Server and for SQL Server Agent are
independently managed, although of course the SQL Server service must be started first.
Although the SQL Server Management Studio gives the DBA a powerful, easy-to-use
interface for defining and managing databases and for performing maintenance tasks,
additional software tools are needed to provide user interfaces needed by users.
• Chapter 9 shows how Microsoft Access can be used as a GUI to provide application
functionality, using links to SQL Server tables.
• For more sophisticated applications, Chapter 11 shows how to use PHP to create a
web-based interface to an SQL Server database.
QUESTIONS
1. Explain the difference between a RDMS and a relational database.
2. When doing hardware planning for an SQL Server database, what type of files must
be planned for and mapped to disk drives(s)?
3. If you installed SQL Server on a laptop, how would you configure the database files
mentioned in #2?
4. If you installed SQL Server on a desktop computer with two disk drives, how would
you configure/store the database files mentioned in #2?
5. If you installed SQL Server on a server with RAID 5 disk drives, how would you con-
figure/store the database files mentioned in #2?
6. When configuring an SQL Server Database Backup, you can choose to overwrite or
append the new backup to the output file. Which would you choose and why? What
are the ramifications of this decision regarding recoverability options?
7. When creating a new table, how is the primary key identified?
8. When creating a table, when is the “Allow null” option selected for a column?
Using SQL Server ◾ 211
9. When creating a table whose key column is an ever-increasing numeric value, how is
that column defined when it is created?
10. When defining a column in a table, how can an index be created on that column?
11. When creating tables for a new database, when are indexes automatically created?
12. What steps must be taken to create a View of three linked tables in a database?
13. When defining an index on a column, does that index have to be unique?
14. What is the significance of a “Clustered Index”?
15. What does it mean to rebuild an index? When is that done?
Chapter 8
“s/(\d{4})-(\d{1,2})-(\d{1,2})/$2\/$3\/$1/”
213
214 ◾ A Practical Guide to Database Design
• So, the above-mentioned specifications within the first two “/” delimiters can be read
as “four numeric digits followed by a ‘-’ followed by one or two digits followed by a
dash followed by one or two more digits.”
• The “()” pairs delimit unique string variables. In the above, the first ($1) represents
the year, the second ($2) the month, and the third ($3). Used in this context, it allows
changing the order of the string variables found between the original and the modi-
fied output.
• This “s” syntax can be read as “change yyyy-mm-dd to mm/dd/yyyy.”
In general, then, Perl scripts are used to search files and identify records that have char-
acter strings that match specific patterns. When found, that record is typically bro-
ken down into substrings, and records are written to an output file using the character
strings found.
A typical application would be to analyze log records for a computer or database. Search
patterns are created to identify/classify log records with information, warnings, or errors,
decomposing those records, and writing these records to informational, warning, or error
output files. When finished, those output files can then be reviewed by an administrator or
database administrator (DBA) to determine the general health of the system being moni-
tored and to identify any corrective action necessary.
Search patterns can also be used to identify:
When a record is found that matches some search pattern, Perl has numerous functions
that support decomposing records/input strings and analyzing its contents.1 The more use-
ful include:
• split—to decompose a string into subfields based on a common field delimiter (often
tabs, commas, or spaces).
• substr—to create a substring of a record beginning at a specified offset and length.
• length—find the length of a character string.
• index—find the location in a string where a substring begins.
• chomp—remove line separators between records in an input file.
• while—identifies the start of a series of commands. Used after opening a file, it identi-
fies the start of logic on how to treat/process records in the file.
Using Perl to Extract and Load Data ◾ 215
FIGURE 8.1 First part of script to search file based on search terms.
See lines 10–14 in Figure 8.1 to see how a while loop is used to read all records in a file.
• if/then/else constructs to group logic to be used when specific search patterns are
found.
• s—the general substitute operator, for example, “s/aaa/bbb/g,” specifies changing the
string “aaa” to “bbb” whenever found.
• system—execute an operating system command. For example, it is frequently used
to execute a directory command to find the names of all files in a directory and write
those names to a file. That file in turn can then be opened in the Perl script to identify
and open files that need to be processed.
• sleep—pause the execution of a script for a specified time. This does not use any CPU
time and is used in a script that runs continuously to wait for a specified time before
continuing with the next cycle of checks/operations performed by the script.
• Arrays—a simple list accessed by index number.
216 ◾ A Practical Guide to Database Design
Arrays are simple but extremely useful. In the following example, an array is used to
store the contents of a file of search terms of interest to the user community associated
with a specific subject area. Those terms then form the basis for a scan of a collection
of log files, in which records containing any of these terms are written to an output
file. This output file would then be imported into a database for further review and
analysis.
As one possible application, users of Department of Defense computer systems are
prohibited from using government systems for illegal purposes/acts, for example, to
search the internet for pornography. By using this approach, system administrators
can create a file of terms typically associated with pornography and use that file as
input to scan new proxy log files to identify users doing internet searches using those
terms. The script can be set up to run every hour, with the files for search results
imported into a database for further review and analysis by system analysts.
Figure 8.1 shows the first part of a script to perform these operations, which is shown
in Appendix D.
• Lines 7–15 open the input file of search terms and load each line as one element
of the array.
• Next, lines 16–23 open the input file to be searched (in this case, a system log file).
• Lines 25–43 in Figure 8.2 show the logic for searching each record for a match of
a search term.
• Line 26 removes the record separator at the end of the first record.
• The file being analyzed is a log file containing nonprintable characters. Line 28
removes those characters from the input record.
• Lines 20–42 show how, for each record, the record is scanned for a matching
term. If a match is found, a record is written to an output file. Note the logic in
lines 35–42 that opens the output file the first time a match is found.
In practice, the output file would then be imported into a database. Analysts would
then review what was found to see what, if any, further actions are necessary.
As another example, Oracle Database Administrators can use this approach to
constantly monitor Oracle database logs for informational, warning, and error
conditions that are of interest.
• First, they would create three different files of search terms, each containing the
Oracle message number/label of interest for that condition/state.
• When started, the script would read each set of search terms into different arrays.
• Next, the script would open and read the Oracle log file, searching for log records
with a date–time stamp that is more current from the last time the script was run.
Using Perl to Extract and Load Data ◾ 217
FIGURE 8.2 Second part of script to search file based on search terms.
• When a new log record is found, it would be checked against each of the three
arrays of search terms, with any matches written to an associated output file. Note
that the name of the output file would contain an element identifying when the
file was created.
• The script would save the most recent date–time stamp from the log file for use
on the next cycle.
• The script would then sleep for a designated time frame (e.g., 30 min) before run-
ning again.
This script would run continuously, and at the end of each cycle, the files of search
results would be moved to a directory to be reviewed by DBAs monitoring the data-
base system’s overall health.
• Hashes (associative arrays)—a hash is an array using an index to store a value. Hash
variables provide extremely fast access in searching the array for values associated
with the key term.
218 ◾ A Practical Guide to Database Design
They are extremely useful when searching files for records that contain a match for
some term in a predefined list. Terms of interest are first loaded into a hash when the
script starts. Next, when a new record is read and broken into substrings, each sub-
string can be compared to the indexed hash to one if there is a match.
• time, localtime—get the current date and time.
• A “.” is used to concatenate two character strings, for example, $newstring
= $string1. $string2.
• int—a function used to return the integer portion of a variable or calculation.
• %—a modulus operator to find the difference between to integers. I have often used this
to make decisions on the basis of the current time stamp when an operation begins.
For example, assume an environment where one server acts as a file collection system
that receives files and then pass them off to one of three servers for detailed analysis/
processing. Let us also assume Server1 and Server2 have half the processing capabil-
ity of Server3. Let us look at how % can be used to evenly distribute the workload
across these three servers.
• First, get the current time and set a variable to the current value of the minute, $min.
• Next, compute the value of a variable based on a modulus of four by computing
“$val = $min % 4;”. As a result, the value of $val will vary from 0 to 3, that is,
Note that by using a modulus of 4, each value of $val will occur 25% of the time.
The value of $val, therefore, can be used to invoke logic blocks that will be acti-
vated 25% of the time, based on the clock time when the logic starts. By using this
approach, files can be forwarded using this logic:
$val = $min % 4;
if ($val == 1) {
<commands to ftp/sftp files to Server1>
}
elseif ($val == 2) {
<commands to ftp/sftp files to Server2>
}
elseif ($val == 3) {
<commands to ftp/sftp files to Server3>
}
Using Perl to Extract and Load Data ◾ 219
elseif ($val == 0) {
<commands to ftp/sftp files to Server3>
}
Note also that because Server3 is twice as powerful as the other two, it will receive
files 50% of the time, and each of the other servers receives files 25% of the time.
• Scripts often must invoke system (operating system) commands. Those are of course
different between Windows and Unix.
• If the script interrogates directories to find and process files, the most significant changes
involve dealing with the separator variables when interrogating/using a directory path.
Unix directory paths use “/” separators, whereas Windows uses “\” characters.
As an example, assume the user Analyst has a Documents directory that in turn contains an
“InputFiles” directory. Our script needs to identify all files within the InputFiles directory
and open and process each. Let us look at the code necessary to find and open each file.
For Unix, the script would first issue a command to create a file that contains the names
of all files found there. The Perl commands would be
• First, the details of the directory path must be modified using “\” separators. Using
the Unix naming convention shown earlier, the directory would be “\home\Analyst\
Documents\InputFiles;” using Windows naming conventions modifies the directory
name to “\Users\Analyst\Documents\InputFiles.”
• Next, a Perl variable set to “\Users\Analyst\Documents\InputFiles” would not work
as is because the “\” character has a special meaning in Perl and is ignored as an input
character/value. The “\” tells Perl to treat the next character in the string as is and to
ignore any special meaning associated with it.
For example, to use a “?” as part of a search pattern, we would use “\$” in that pat-
tern to specify the use of the $ as a character. Otherwise, Perl treats the $ as a meta-
character specifying the end of a string.1
To tell Perl we want to treat the “\” as a character value and not as a meta-character,
we change the string to include two “\\” characters:
“\\Users\\Analyst\\Documents\\InputFiles.”
• In addition, the command to list files in a directory must be changed from the Unix
“Is” to the DOS equivant “dir”.
Using Perl to Extract and Load Data ◾ 221
I have rarely had to move/migrate scripts between Unix and Windows. However, the fact
that Perl scripts are 98% portable made it much easier and faster to develop new applica-
tions or services by reusing sophisticated logic regardless of the computer platform.
Option 1:
Option 2:
#note the first pattern calls for either one or two digits
if ($record =~ /(\d{1,2}):(\d{2}) AM/) {
#additional logic on what to do
}
• Note that SQL Server error messages are much simpler than Oracle log files, classify-
ing each message as informational, warning, or error messages.
• SQL Server logs are installation specific. In this example, they were installed by default
in the C:\Windows\Program Files\Microsoft SQL Server\MSSQL\Log directory.
• The simplest way to monitor the system is to continually monitor changes in the
SQLAGENT.OUT file in SQL Server’s Log directory. In this file, informational mes-
sages contain a “?” character, warning messages contain a “?”, and error messages
contain an “!”.
The following Perl script (shown in Appendix E) opens the SQLAGENT file and
scans for informational, warning, and error messages. When found, they are writ-
ten to a corresponding output file. At the end of the check cycle, the script pauses
for 30 min and repeats the checks. Note that each output file includes a date/time
Using Perl to Extract and Load Data ◾ 223
stamp as part of the name to separate the information being generated from each
cycle. At some appropriate time interval, all output files would be moved to a direc-
tory used by DBA’s to monitor the database’s status. Typically, another Perl script
would run continuously in parallel to check for new files associated with error mes-
sages and display a white on red status message to alert DBA’s to review new files for
the latest error messages/updates.
Let us analyze this script in detail, starting with Figure 8.3.
• Line 5 defines a variable for the location of the SQL Server SQLAGENT file.
• Line 7 sets a user variable for the time delay between log checks. Here, it is set to
30 min, with the associated number of seconds computed in line 9.
• Line 9 determines the point at which the logic repeats with the next cycle.
• Lines 10–12 set flags for each class of output file; these flags control the opening of
each class of file when the first record for each is found.
• Line 13 gets the current date and time information. A date–time variable is
extracted in line 15 to make them usable as part of a file name; spaces and “:”
characters are altered in lines 17 and 18.
• The log file is opened in line 20, and the cyclical check of the log records begins
with line 21.
• Line 22 removes the line separators in the record.
• Line 23 removes all unprintable characters.
• Lines 23–27 extract the date–time string from the last non-blank record.
• In line 29, the date–time string is compared with the last date–time string from
the previous check. If the current record has a date–time string with a lower value,
it has already been checked and can be ignored.
• Lines 31–41 show the logic for handling informational records (i.e., those con-
taining the “?” character).
• Lines 32–37 show the logic executed the first time an informational record is
found in this check cycle. The file name is created embedding the date–time vari-
able set in lines 14–19, the file is opened, and the log record written to it.
• In lines 38–40, if the output file is already open, the log record is written to it.
Figure 8.3 shows the remaining part of the script with logic for warning and error log
records.
• Lines 42–52 duplicate the logic for lines 31–41 modified as needed to handle
warning records identified with a “+” character. Note the use of the $warn vari-
able to control the warning output file.
224 ◾ A Practical Guide to Database Design
• Lines 53–63 duplicate the logic for lines 31–41 modified as needed to handle error
records identified with an “!” character. As was done earlier, the $err variable
controls the error output file.
• After the file is checked, the last date–time string is saved in line 66. All files are
then closed in lines 67–70.
• The local time is obtained in line 71 and used in the message generated by line 72.
• The sleep function invoked in line 73 temporarily halts all processing, and the
process sleeps until the specified time delay ends.
• At that point, the logic is redirected by line 74 to resume processing at the START_
CHECK label at line 9.
Note that this process will loop/continuously process until manually interrupted/
canceled.
Using Perl to Extract and Load Data ◾ 225
• Open the Event Viewer and open the “Windows Logs” label to display the types of
logs available for analysis. Note the Application, Security, and System labels.
• Select either the Application, Security, or System labels and note the list of log records
that have been created. In the right-most pane, select “Save all events as” to create an
external file containing these log records.
• Use the “Save as” browser window to navigate to the working directory in which
these logs are to be stored for analysis.
• Next, change the “Save as type” to Text to create a tab-delimited text file using a file
name indicating the type of log records being used.
• The script shown in Figures 8.3 and 8.4 can be modified to review and summarize
these log records. These changes are fairly straight forward:
• Change line 5 to indicate the directory path and name of the input file created earlier.
• This script will be used once after each log record extraction and not run con-
tinuously. As such, the date–time information is not needed, so the code in lines
25–29 and 71–74 can be commented out or deleted. Note that the $datetime vari-
able is still set for use when creating the name of the output file of log records.
• In line 31, change the specification for information records from “/\?/” to
“/Information/.”
• Similarly, in line 42, change “/\+/” to “/Warning/,” and in line 53, change “/\!/” to
“/Error/.”
• Finally, modify the $infofile, $warnfile, and $errfile variables to include the names
“Application,” “Security,” or “System” based on the type of log file being analyzed.
• Repeat this process, changing the name of the input file and the $infofile, $warnfile,
and $errfile variables as appropriate.
These scripts are now ready for use. To analyze one or all of the Windows logs, use
Event Viewer to extract and save the records to be reviewed in a tab-delimited file,
then run the appropriate extraction script.
226 ◾ A Practical Guide to Database Design
• If SQL Server is running on that system, an agent can be created to periodically export
the contents of a small table. Checking the date and time stamp of the exported file
verifies that both the operating system and SQL Server are up and running.
• A Perl script can be written to create an output file each time run, then sleep for
a specified period and repeat the process. Monitoring the date and time stamp of the
output file confirms that the host and Perl script were active at that date and time.
• Next, a collection/monitoring script is needed to check the results of all monitor-
ing scripts running by either directly checking to output directories in which
these scripts run or by ftp/sftp-ing the results of these directories to a local direc-
tory. The monitoring script can then continuously check the health of a number
of remote systems without requiring an analyst/administrator to continuously
connect to each to perform a manual verification. When run on Unix, I always
reset the display to white on red to make the information stand out.
If this monitoring script runs on Windows, it can be made even easier to use by
not only displaying error messages when errors are found but also playing selected
Windows sounds to get the attention of the analyst/administrator running the system
checks. My personal favorites are the sound of glass breaking followed by a tinkling
sound. They get your attention!
• Perl scripts can easily monitor a directory for new files that have been created and
then take an appropriate action to forward or process the new files.
• The command-line option of Perl can be used for sophisticated editing of the file con-
tents, for example, to enforce consistency of formatting of date/time strings.
• First, files associated with pornography can be loaded as a set of search terms.
• Next, new proxy log records are reviewed to identify any internet activity associated
with those terms.
• Finally, records that match one of these search terms can be imported into a
pornography table for analysts to review in detail. This normally invokes review
of all activity for a user over a period of time to identify real abuse versus a one-
time finger-check/error.
A web-based interface such as described in Chapter 11 can be created for analysts to
allow them to classify records as Reviewed (no further analysis required), Pending
(current records under review), and Reported (activity/records that have been
reported).
Fortunately, files created with Perl can be easily imported into a database.
• Files can be directly imported into Microsoft Access, most commonly using a tab or
comma delimited file as input.
• If using SQL Server, files can be imported into a table using SQL Server’s bcp
utility.
• If using Oracle, files are imported using Oracle’s SQL*Loader utility.
Depending on the design and content of what is being monitored, the import function
may be totally sufficient to update the database. In a more sophisticated application, such
as monitoring and reporting on proxy log activity described in Chapter 11, newly imported
rows need status flags set accordingly. Those can easily be handled within the database as
part of import processing.
While the import processing is very straight forward, there are some subtleties that
must be checked.
• The order of the data in the file to be imported will normally match the sequence of
columns in the table’s definition.
Using Perl to Extract and Load Data ◾ 229
• Each row will normally have a key field/column embedded within the data record. In
some cases, rows are identified only by an ever-ascending row value which is incre-
mented automatically when new rows are imported.
• Imported files are driven/controlled by delimiters that separate the strings represent-
ing different columns. Check carefully to ensure that the delimiters chosen for use
never appear within the character strings being imported.
• In some cases, post-processing of data in new rows is necessary. For example,
in the proxy log monitoring discussed earlier, newly imported rows have null
values for the Reviewed/Pending/Reported columns which should all be set by a
database update to have N values, which flag new records that need to be reviewed
by analysts.
8.9 SUMMARY
I have spent most of my work life with a database-centric point of view. In my later years as
a developer of database systems, I have continually found user requirements that involved
finding data within files of all types and extracting it, transforming it into information
needed by the user community. Perl has been invaluable. Going forward, I would recom-
mend including Perl (or Python) in every developer’s tool bag.
QUESTIONS
1. What must be done to a PERL scripts being migrated from a UNIX to a Windows
platform?
2. What are search patterns? What purpose do they serve?
3. Describe a pattern that will detect a date in the format of “mm-dd-yyyy.”
4. Show the PERL command to change a date string “mm-dd-yyyy” to “yyyy-mm-dd.”
5. What is the purpose of the split command? Give an example of how it can
be used.
6. What is the purpose of the index operator? Give an example of how it can be used.
Refer to the PERL script shown in Figures 8.1 and 8.2.
7. What is the purpose of the $nterms variable?
8. What is the purpose of the “if” test in lines 31–42?
9. What is the purpose of the $match variable?
10. What records are written to the Out file?
11. If you wanted to expand the search to include additional terms, what would you do?
Refer to the PERL script shown in Figures 8.3 and 8.4.
230 ◾ A Practical Guide to Database Design
REFERENCE
1. Medinets, D., PERL 5 by Example, Indianapolis, IN, Que Corporation, 1996, p. 203.
Chapter 9
231
232 ◾ A Practical Guide to Database Design
9.1.2 Advantages
• Microsoft Office is standard software in essentially all government offices, including
those in Intelligence Community and DOD.
• It is easy to import data into tables from flat files, Excel, or other databases.
• User functionality can be divided between a Data database having the core tables
needed by users and a second database serving as the graphical user interface (GUI)
to the data. The GUI would contain the query and update mechanisms and the Forms
and Reports needed by users; its tables simply be links to the physical tables residing
in the Data database. Each user would then be given a copy of the GUI database to
run on their workstation.
• If/When necessary, a table can be manually viewed, sorted, and/or filtered to view
data of interest. Those rows can subsequently be updated or deleted as needed.
9.1.3 Disadvantages
• Complex queries often require numerous work tables, queries, or macros to achieve
a specific result. From the database administrator perspective, documentation for the
design and functionality of the numerous components can prove difficult to write and
maintain.
• Complex user displays can be difficult to create.
9.2.2 Advantages
• Queries and Forms are normally quick and easy to design and test. Developers can
get immediate feedback from users from the effectiveness and usefulness of the pro-
totype being developed.
• Access is preinstalled on all DOD and Intelligence Community computers.
Building User Interfaces ◾ 233
• ODBC links can be easily created to access SQL Server or Oracle databases running
on the server.
• More complex queries can be passed to the server using the “Pass-Through” option.
This passes the SQL for the query directly to the database on the server for interpreta-
tion and execution.
• As stated earlier, tables can be manually viewed, sorted, and/or filtered to display data
of interest. These rows can then be updated or deleted as needed.
9.2.3 Disadvantages
• Complex queries are difficult to support. Multiple simple queries can be tied together
to achieve a more complex result, but documenting the details of the design can be
difficult.
• The GUI database must be installed on each user’s computer.
• More complex user displays can be difficult to implement.
9.3.2 Advantages
• The Windows displays built in a .NET Framework are very powerful. Very complex
windows can be displayed with action icons/buttons.
• These tools can interoperate with other Windows-based tools. For example, a display
can be generated to interact with vendor software to monitor and control that ven-
dor’s surveillance cameras.
• Object-oriented “Design Patterns” are quite beneficial.
For example, the “Template” approach is quite effective in creating mechanisms to control
different types of surveillance cameras. Essentially, all cameras require the same type of
agents to control camera functions (tilt, zoom, etc.), but individual agents would contain
the vendor-specific code to operate that type of camera.
234 ◾ A Practical Guide to Database Design
9.3.3 Disadvantages
• The .NET Framework runs only on Microsoft Windows platforms.
• .NET implementations are totally dependent on the specific version of Microsoft
operating system and internals used to build them.
• The .NET Framework and tools that run under it must be installed on each computer
to be used.
• Development and support of .NET Framework tools are very complex and are deeply
interwoven with Microsoft internals. In addition, they are difficult and complex to
maintain.
• I found it difficult to update existing .NET-based tool because the behavior of the
application can be mouse- or cursor-sensitive in which the developer does not see the
code causing some result.
9.4 PHP
PHP is a widely used scripting language that can be used to develop web pages displaying
information and options to users. Chapter 11 describes the creation and use of a web page
to monitor database updates using PHP.
9.4.2 Advantages
• Web pages built with PHP are independent of the operating system on client
computers.
• Although hidden from users, the logic and SQL commands are seen in clear text by
developers and are easy to read and maintain.
• PHP software can be downloaded at no cost.
Building User Interfaces ◾ 235
9.4.3 Disadvantages
• The setup for a PHP-based implementation is somewhat complex.
• It requires (of course) a web browser running on the host.
• Typically, data encryption of web traffic is required.
• PHP requires database drivers for the DBMS to be used.
• PHP must be set up and activated on the host before development can begin.
9.5 JAVA
I have personally never developed web pages with JAVA. I have, however, worked in a sup-
port role in which a team of developers were using JAVA to build entire database applica-
tions for users.
9.5.2 Advantages
• Development tools are commonly available.
9.5.3 Disadvantages
• Developers require additional training and use of development tools.
• JAVA is, in my opinion, not suitable for rapid prototyping.
QUESTIONS
1. When using Microsoft Access as a GUI, name two RDBMS’s that can serve as the
source for a database query.
2. When using Microsoft Access as a GUI, what is a linked table?
3. When using Microsoft Access as a GUI, how can a complex query using DBMS-
specific SQL be executed?
4. By default, where is a Microsoft Access database installed? What users can run que-
ries against the database?
5. Compare the use of Microsoft Access and PHP when performing a rapid prototype
for a new database application.
236 ◾ A Practical Guide to Database Design
A design team supporting a small group of analysts must do a rapid prototype to develop
a GUI to be used to analyze data collected for the team.
6. What relational database management systems could be used to store the database
and its tables?
7. Would a .NET framework be appropriate for use to develop the GUI? Why or why
not?
8. Would Microsoft Access be appropriate for use as a GUI? Why or why not?
9. Would PHP be appropriate for use to develop the GUI? Why or why not?
Assume a university environment where hundreds of teachers and administrators
have Windows-based computers varying in size and capacity and with a mixture
of Windows 7 and Windows 10 operating systems. GUIs must now be developed to
administer course and student information.
10. Would a .NET framework be appropriate for use? Why or why not?
11. Would Microsoft Access be appropriate for use? Why or why not?
12. Would PHP be appropriate for use? Why or why not?
Chapter 10
In Chapter 6, we saw how Microsoft Access can be used to create and load tables, run
simple queries, and create simple Forms. In the current chapter, we will see how to use
Access to create a more complex application based on the University data model developed
in Chapter 4. We will use Forms to create a Windows-based application to
Let us begin by creating tables for the University database using the physical data model
in Figure 4.7.
• From the Start menu, click on the Access label for your system (e.g., “Access 2016”) to
start the RDBMS software on your computer.
• Next, click “Blank Database” to start a new database management system instance.
• When the “Blank Database” screen opens, click on the folder icon and navigate to
the directory in which the RDBMS is to be located. Next, change the file name to
“University.accdb,” and click “Create.”
237
238 ◾ A Practical Guide to Database Design
• The Table1 window opens in data entry mode with a default name for the first column
as ID. Click the small “x” at the top right of the Table1 pane.
• A window will open for you to specify the table name, enter “School,” then click
“OK.”
• We now have a window where we can enter the column names (under Field Name)
and the data type for each.
For the School table, enter the column name SchoolID and choose “Short Text” for a
data type; in the lower part of the window, change the Field Size to 50 bytes.
Creating the University Database Application ◾ 239
• Continue entering the column names and data types for the School table.
240 ◾ A Practical Guide to Database Design
When entering data type for a column, note that Access does not support varchar,
instead choose “Short Text” as a data type, and in the lower pane, change the field size
to 50.
• Verify that the key symbol appears in the row(s) for the key column(s) for this table
and click the small “x” on the School line to close the table. In response to “Do you
want to save changes to the design of table ‘School’,” click “Yes.”
• Referring to Figure 4.7, repeat this sequence to create all tables in the physical data
model.
Creating the University Database Application ◾ 241
• Select all table names and then click Add; all tables will be displayed.
Next, rearrange the tables to reflect the layout used in Figure 4.6. (Minor revisions in
the diagram are fine; it is just easier to group associated tables together.)
Creating the University Database Application ◾ 243
A screen will open with a “Student” form in design mode. Note that all columns from the
Student table have been added to this window.
246 ◾ A Practical Guide to Database Design
This form could be used as is with no changes; however, remember that we want to use
a pull-down box for DepartmentID to choose an acceptable value rather than having to
type the value.
To make this change
• Select the empty box to the right of where the DepartmentID label appeared, then
right-click and select “Delete.”
248 ◾ A Practical Guide to Database Design
The form design has now been modified to allow a space for the DepartmentID drop-
down box.
• Click the Design tab and scroll/find the drop-down box icon; click on that symbol as
shown in the following.
Creating the University Database Application ◾ 249
• After clicking the Combo box icon in the above-mentioned location, the following
dialog opens:
• First select DepartmentID as shown earlier, then click the “>” symbol to get the
following:
252 ◾ A Practical Guide to Database Design
• We have now specified that we want a Combo box (a pull-down list of values) to see
DepartmentID; the next window gives us the option of choosing a sort option.
This screen gives us the option to adjust the width of the display; the values shown
earlier show the test values previously entered for Departments.
Creating the University Database Application ◾ 253
• In the next window, choose the “Store” option and select DepartmentID to store the
value retrieved into the form as DepartmentID.
• Clicking Next brings up a window asking for a name for the list; enter DepartmentID.
254 ◾ A Practical Guide to Database Design
• The Form has now been modified to include the Combo box just created.
The Form has now been updated to include a drop-down list with all DepartmentIDs.
To finish the design of the form, we need to make one further modification. At the
bottom of the form layout, add a “Save” button to add this information to the database.
Creating the University Database Application ◾ 255
• With the form in design mode, select the Design and Button options as shown in the
following.
256 ◾ A Practical Guide to Database Design
• Next, move the mouse to the bottom of the form and drag it open to form a box.
Releasing the mouse gives the following:
• Click Finish, then enter a meaning name at the top of the form.
258 ◾ A Practical Guide to Database Design
Opening the form opens a blank record where student information can be entered.
Note the drop-down icon on the DepartmentID line.
• Clicking on that arrow displays a list of all DepartmentIDs. Choose the one appropri-
ate for the student being enrolled.
Creating the University Database Application ◾ 259
• Fill in the other information for that student, then click the Save button to add the
information to the Student table and close the form.
• To use the form, click on Forms/New Student and the following screen opens. Note
that the pull-down is populated with a list of DepartmentIDs in the Department table.
260 ◾ A Practical Guide to Database Design
Select the DepartmentID for the new student, fill out the other items, and click Save
to add that information to the Student table.
One of the last things we will do in this section is to create a “Main” page with con-
trols to activate/perform all of the functions needed. We will add a button to that
page to open this form when needed.
If you also need a form to modify Student information, follow the above-mentioned
steps, but instead of creating a Combo box for DepartmentID, create a Combo
box based on StudentID in Student. That form would then read and display all
information for the selected student, and Save would close the form with those
updates.
• Using a form linked to a work table, capture the information about the student,
course, and section involved.
• Save the form to update the work table with this information.
• Insert these data into the Course Enrollment table.
• Delete the information in the work table (to clear it for the next enrollment).
• Reopen the enrollment form to start a new enrollment record.
• Delete the CourseID row and replace it with a Combo box to look up DepartmentID
values from the Course Offering table, that is, show a list of DepartmentIDs for all
courses being offered.
• Delete the SectionID row and replace it with a Combo box to look up SectionID val-
ues from the Course Offering table.
When this Combo box is built, a query is created with the syntax “Select SectionID
from [Course Offering];”, and the list returned will contain all SectionID values
for all course offerings. The query needs to be changed to include the CourseID
value from the previous line; that is, show only SectionID values for the selected
course.
To make this change, select the SectionID line just created and select the
SectionID box.
Right-click that box and click Properties to see the details behind the Combo box.
On the Data tab of the Property sheet, note the beginning of the “Select” clause for
the Structured Query Language (SQL) query.
Creating the University Database Application ◾ 263
This query must be modified to include the CourseID value from the form. To get this
value, go to the CourseID line and click on the CourseID box.
264 ◾ A Practical Guide to Database Design
The Property sheet for this box shows at the top the internal name for this box/value;
in this case, “Combo21” is displayed. This is the value that needs to be included in the
SectionID query.
Going back to the SectionID query, click on the box with the three dots to open the
query.
The design mode can be used to add the qualifier of CourseID = Combo10. I prefer
to use the SQL mode, so right-click the design view and choose “SQL View” to get
this display.
Creating the University Database Application ◾ 265
Click the x at the top right of the design window to close and save this change.
When using this form and clicking on the SectionID drop down, the list will now
show SectionIDs for the CourseID selected on the form.
• Delete the StudentID line and replace it with a Combo box displaying the StudentID
values from the Student table.
The basic changes in the form are now complete.
Next, with the form in Design mode, start to add a Button at the bottom of the form;
when the first question comes up, click “Cancel.” Next, click on the button just cre-
ated, right-click, then pick “Build Event” at the top of the list of options shown.
Next, pick “Code Builder” from the options shown.
This is the window that we will use to enter commands to perform the following
actions:
• Close the form (thereby saving data that was entered in the “Selected Student
Enrollment” table).
• Run an INSERT command to add that data to the Class Enrollment table.
• Delete the data in the “Selected Student Enrollment” table.
• Reopen the form to clear it to enter another student.
Opening and closing forms are simple commands. Let us concentrate first on how to
run the INSERT and DELETE commands; I recommend creating queries to perform
both functions.
• For the INSERT command, use the Query wizard to start a new query and switch to
“SQL View”. Using the “INSERT INTO …” syntax, enter the following command:
• After saving the query, open it again in Design mode, right-click the upper pane,
move the cursor down to “Query Type,” and select “Append Query.”
• The query can now be run manually to test it, if desired.
• For the DELETE command, use the Query wizard to start a new query and switch
to “SQL View.” Enter the following command:
• Close and save this query. Run it manually to test it, if desired.
You are now ready to add commands to the button just created to add the functional-
ity described earlier.
• With the form in Design mode, click on the button added earlier, right-click it,
and choose “Build Event” at the top of the pop-up menu.
• Between the “Private Sub …” line and the “End Sub” lines, insert the following
commands:
Note the “SetWarnings” commands. The INSERT and DELETE commands will
notify/warn the user of pending updates; turning the warnings off for these com-
mands makes for a smoother operation.
• Rename the label in the new button to “Save,” and close the form, saving all updates.
The form is now ready for use.
• Create a work table “Selected Student” with columns DepartmentID, SectionID, and
StudentID.
• Open the Query wizard to Design mode and add tables “Selected Student” and
“Course Enrollment.”
• Holding the mouse down, move the cursor between DepartmentID in the two tables;
repeat that operation for SectionID and Student ID.
Creating the University Database Application ◾ 269
Continue adding Assignment columns from the Course Enrollment table until all
Assignments are added.
270 ◾ A Practical Guide to Database Design
• Rearrange the boxes displayed in the Design panel to be more readable; for
example:
Creating the University Database Application ◾ 271
• As one final step, add a button at the bottom of the form. When the query wizard
starts, cancel the operation, select the box and right-click it to open the “Build Event”
option. Enter “DoCmd” commands modeled after that shown above to close the
form, then reopen it.
The form can be closed and saved and is ready for use.
• Following the aforementioned steps, use the Selected Student and Class Enrollment
tables to create a query displaying only CourseID, SectionID, StudentID, and
MidTerm Grade.
• Select this query and use the Forms query to create a new form.
• As was done earlier, change CourseID, SectionID, and StudentID to create Combo
boxes for these elements based on Course Enrollment.
• Add a button at the bottom that will close and reopen the form just created.
• Once again, following the above-mentioned steps, use the Selected Student and Class
Enrollment tables to create a query displaying only CourseID, SectionID, StudentID,
and Final Grade.
• Select this query and use the Forms query to create a new form.
• As was done earlier, change CourseID, SectionID, and StudentID to create Combo
boxes for these elements based on Course Enrollment.
• Add a button at the bottom that will close and reopen the form just created.
• As the wizard for each button opens, choose Form Operations and Open Form.
• Next, pick the form name to be opened.
• Text labels can be added if desired to label different sections of the display.
• If desired, the background color can be changed.
• Save this form with an appropriate name, for example, “Main.”
Your “Main” form will then look something like this:
This “Dashboard” can then be used to invoke any of the functions created within the
database.
Testing and using each component will show additional minor changes needed. For
example, most forms opened may need to be modified to add a button to take the user
to the “Main” page.
• Begin by copying the original database into a “Data” database and again to a “User
Interface” database.
• The “Data” database will contain only the core tables running on a master computer
that is network accessible to the other staff members. All other components are
deleted from the “Data” database.
Creating the University Database Application ◾ 273
• The User Interface database is modified by deleting all data tables (not the
“work”/“Selected…” items) and recreating each table by linking each to the Access
“Data” database. This User Interface database is given/copied to each user’s workstation.
Hosting applications in this way supports the rapid development and testing of applica-
tions. As new versions of the application are created, users just need to be given a copy of
the new User Interface for the application.
QUESTIONS
1. Describe the difference between a database Table, a View, and a Query.
2. By default, what columns have indexes created when a Table is defined?
3. What is a Form? How does it differ from a Query?
4. When using the Query wizard to create a query, what does the “View” mode/button do?
5. What are the different types of queries supported by Microsoft Access? What does
each do?
6. What is a Pass-Through query? When/Why is it used?
7. What is an Update query? Why/When is this specified?
The remaining questions are based on the Microsoft Access University Database.
8. What is the purpose of the “Reset StudentID” query?
A query must be created to display all students associated with a specific department.
9. When using the query wizard, what tables are needed?
10. To run the query, what can be done to identify/select the DepartmentID of interest?
A query is needed to show all students enrolled in any course for a specific department.
11. What tables must be included in the query?
12. When designing the query, how are specific columns included or excluded from the
output display?
13. What is the purpose of the “Main” form, and what does it contain?
14. When using the Form wizard, what object is typically chosen as the basis for the
Form design?
15. If the database design described in Section 10.3 were to be altered to store core data
tables in SQL Server, what changes must be made to the table definitions?
Chapter 11
• If all users are on the same network, Microsoft Access can be used to create and
deploy user interfaces that are each linked to a master database containing the tables
needed/used by users.
This approach is fast and simple but limited by the functionality available in using
Access. In addition, there may be significant overhead in deploying/maintaining the
Access graphical user interface on each user’s computer. There may also be design/
implementation issues if users need to simultaneously read and update the same data.
• Microsoft’s Visual Studio software suite allows you to create complex and powerful
user interfaces/applications. However, because these applications must be deployed
on each user’s computer, these implementations create software dependencies for
computers on which they are to be deployed and are dependent on the version of the
Windows operating system used to develop the application. In addition, from the
developer’s point of view, applications that are sensitive/aware on mouse movements
and actions are more difficult to develop and maintain.
• Java can be used but again has software dependencies and introduces a higher level of
software complexity to design and deploy the application.
275
276 ◾ A Practical Guide to Database Design
Section 7.3 covered the design and implementation of an SQLSvrLogs database. Next,
Section 8.5 includes a description of how to use PERL to extract Warning and Error mes-
sages from an SQL Server log. This section ties this material together and shows how to
develop a PHP web-based application that allows users to identify and review new Warning
and Error log records as they occur.
Note that this process is based on the process of capturing log records of inter-
est, importing them into SQL Server tables that include the message content plus
columns/flags to denote Reviewed, Pending, and Resolved statuses, and providing a
user interface to monitor and update this information as new updates are applied. All
of these steps can run 24 × 7 and allow the user to focus on new log messages as they
occur. In addition, the overall design and components are very generic in nature. This
approach can easily be adapted to monitor and review log records in another database.
For example, system administrators tasked with monitor proxy logs for internet traffic
could use this approach using status flags to denote Reviewed, Pending, or Reported
statuses.
Let us now review how to implement all of these components on a single host that serves
as the web interface for tracking SQL Server log messages.
• The host itself must of course provide a web browser such as IIS or Apache. When
using IIS, setup issues include
• Establishing the host’s web directory that will hold the scripts and HTML files
that are to be accessed/used by external users.
PHP Implementation and Use ◾ 277
• Creating a “Handler Mapping” that associated PHP scripts (i.e., files ending in
“.php”) with IIS’s FastCGI module. This provides IIS with a high-performance
mechanism for processing PHP scripts.
• Any supporting files for web page management would be placed in subdirectories
in the directory defined earlier.
• PHP itself must be downloaded and installed. The installation itself is very straight-
forward.
• First, download the PHP software for your host.
• Next, a few lines in the php.ini file must be edited per PHP’s installation instructions
to reflect the software environment on that computer. For example, there is an “open_
basedir” variable that must be set to the web directory to be used for web scripts.
• In addition, database drivers must be downloaded to match the database that will
be used. These drivers are placed in PHP’s “ext” directory, and the php.ini file
updated to include the name of those drivers.
The PHP configuration can be tested by (1) creating a file with the content “<?php
phpinfo(); ?>”; (2) saving this file with the name “phpinfo.php” in the host’s Web
directory; then (3) opening a web browser and entering the following link http://
localhost/phpinfo.php.
If PHP is installed correctly, a report will be display showing the details of the PHP
configuration and installation.
• The database management system to be used must of course be installed, along with
the database itself and any required tables.
• In the example application reviewed in Chapter 7, SQL Server 2008 R2 was
installed on the host computer.
• Next, an SQL Server database named “SQLSvrLogs” was created with tables to
store the data needed by online users (see Section 7.3.1 for details).
The tables referred to in this chapter are an extension of the material discussed
in Section 8.5 for the review and analysis of Warning and Error messages from
SQL Server logs. Tables “WarningRecords” and “ErrorRecords” have the follow-
ing columns:
Note that the RecordKeys column is created within SQL Server using the Identify
property. This causes SQL Server to automatically increment the value for row
keys by “1” each time a new row is added. With this construct, individual rows
can uniquely be identified/used for updates without having to define uniqueness
within the content of the message strings themselves.
• To run in a 24 × 7 environment, a script must be created to periodically check
for new files in a directory used to store updates. When found, each file must be
imported using “bcp” into the appropriate table, then moved from the import
staging directory to an archive directory.
As the tables being updated have more columns than the data being imported
(the status flags/columns), a format file must be used to describe the contents of
the file(s) being imported. To create the “LogRecords.fmt” format file, use the fol-
lowing command:
This format file contains descriptions for each column in the table.
As the file being imported will only contain the message information, the format
file must be edited to reflect that only the Message column is being imported. The
edited format file will now contain
10.0
1
1 SQLCHAR 0 0 “\r\n” 2 Message SQL_Latin1_General_CP1_CI_AS
The Warning and Error log messages created by the PERL script can now be
imported using the following command syntax:
• After new messages are imported into the WarningRecords and ErrorRecords
tables, the Reviewed, Pending, and Resolved flags for each new row will have null
values. After all new rows have been imported, an SQL Server process must run
an SQL update that checks the WarningRecords and ErrorRecords that change all
null status flags to “0” (i.e., off ).
PHP Implementation and Use ◾ 279
When loaded from the host’s web directory, the script produces the following display on
the following page.
To log in, the user/analyst enters their logon name and password, then selects the Logon
button.
If the user or developer is testing code changes and wants to see internal trace informa-
tion from the subsequent page accesses and database calls, they would select the down
arrow by the Debug question and choose Yes before selecting Logon.
PHP Implementation and Use ◾ 281
In this application, the name of the PHP script is UserLogon.php and appears below.
Note that when the form is submitted, the UserCredentials.php script is called. That
script is reviewed under the section “User Authentication.”
UserLogon.php
• This script takes the name and password provided by the user/analyst and attempts
to log on to the SQL Server “SQLSvrLogs” database.
282 ◾ A Practical Guide to Database Design
UserCredentials.php
PHP Implementation and Use ◾ 283
• Home: A return to the Home WebHeader.php page. (This page is always loaded in
subsequent pages and allows the user to get back to the primary option page.)
• Review/check warning records: The link to WarningMessages.php allows the user/
analyst to review new warning messages and provides a mechanism to change the
various flags for those records.
• Review/check error records: The link to ErrorMessages.php allows the user/analyst to
review new error messages and provides a mechanism to change the various flags for
those records.
Following on with this example, records two and four are not significant and can be
ignored by setting their Reviewed flags. This is done by clicking the Selection boxes for
these rows, clicking the arrow for Actions, and selecting the option Set Reviewed Flags for
these rows as shown in the following.
After making the above-mentioned selections and setting the Action desired, click-
ing the Update button reloads this script/web page, passes parameters to the database to
update the Reviewed flags, then reruns the query to identify warning messages yet to be
reviewed. By using this example, the output displayed is on the next page.
PHP Implementation and Use ◾ 285
• Set Reviewed flags for records identifying insignificant events/states which can be
ignored
• Set Pending flags for records that should be analyzed further
• Set Resolved flags for records that reflect issues or conditions that have been resolved
286 ◾ A Practical Guide to Database Design
In this example, assume that the two remaining records should be researched further.
To set these flags, click the Selection columns for these records, click the Actions pull-
down and select the option Set Pending flag for these rows as shown in the display on the
following page.
PHP Implementation and Use ◾ 287
Clicking on Update will repeat the process of passing these options to this web page as it
is reloaded, running the SQL updates associated with these records, and running the query
to check for any additional records.
Continuing on with this example, let us now check for warning records with a Pending
status. To see those records, first click the Actions pull-down and select the option Show
all Pending Activity.
PHP Implementation and Use ◾ 289
Clicking on the Update button produces the following display, showing the two records
that were set to Pending in the last update earlier.
290 ◾ A Practical Guide to Database Design
Assume that these conditions have been investigated and any required action taken to
resolve the issue. To set these flags to Resolved, click the Selection boxes, then click the
Action pull-down and select the option for Set Resolved flag for these rows as shown in the
following.
PHP Implementation and Use ◾ 291
Clicking the Update button will reset these flags and update the display of records being
reviewed.
292 ◾ A Practical Guide to Database Design
This page works identically to that shown in Section 11.3.4 except it is associated with
the ErrorRecords table in the database.
As described earlier, if the user chooses an action to update some record flags or requests a
display of pending or resolved records, when the Update button is selected, the script/web
page reloads with those options selected.
• If flag updates were requested, SQL updates are made to update the selected rows.
• Another query is then run to display records having no flags set.
• If an action is taken to display Pending or Resolved records, the default query SQL
string is overwritten appropriately so that when the query is run for records to be
displayed, the selected record types are retrieved/displayed.
• The user could of course simply select one of the links at the top of the display.
The user must then review the display and determine what subsequent action is needed or
select a link at the top of the display to review a different set of messages.
With these different operations in mind, here is how they are implemented within
WarningMessages.php:
Lines 11–14: A locally developed trace/Debug option. The script contains trace messages
throughout that display message string at execution time if the Debug operation is
turned on when the user logs in.
Lines 16–80: A startup/initialization function called when the script is loaded. If the
Debug option is set, messages are set to display the value of key variables. It also
handles Selection options when set.
Line 82: Saving the name of the web page loaded as the default for the next web page to
be called.
Line 85: Displaying/Calling WebHeader.php.
294 ◾ A Practical Guide to Database Design
Line 88: Calling the hidden script to logon and authenticate the user to the SQL Server
database.
Line 90: Invoking the startup function.
Lines 91, 92: Checking for SQL data returned from a prior SQL Select call.
Lines 97–282: Logic to handle the Action operation selected on a prior execution.
Lines 101–123: Handling data returned from a prior SQL Select call to the database.
Lines 125–131: Option 1—Defining the query to be used.
Lines 134–169: Option 2—The logic to update the Reviewed flag/column for the selected
records/rows in the database.
Lines 172–210: Option 3—The logic to update the Pending flag/column for the selected
records/rows in the database.
Lines 213–251: Option 4—The logic to update the Resolved flag/column for the selected
records/rows in the database.
Lines 254–264: Option 5—Reset the query statement to be issued for displaying warning
records to show all records with the Pending flag set.
Lines 267–281: Option 6—Reset the query statement to be issued for displaying warning
records to show all records with the Resolved flag set.
Lines 289–308: Logic to test for and preserve variables passed when a new web page is
called.
Lines 311–326: Logic to handle the options in the Actions drop-down list.
Lines 341–387: Logic to deal with the query to be issued, to issue the SQL call to the
database, and build a header display of the column names returned.
Lines 388–408: Logic to issue the SQL query and display results returned.
One final note, all scripts and displays in Chapters 8 and 9 were taken from a working
Windows 7 computer configured and running as a web host with IIS and an SQL Server
SQLSvrLogs database loaded with actual SQL Server log messages. All scripts and
command-line instructions were taken from alive, running system. Together they hope-
fully illustrate how easy it is to design and create a system to perform real-time monitoring
of events or log files, load items of interest into a database, and provide analysts with a web-
based interface to review and categorize the collected information.
QUESTIONS
1. Name the system software necessary to provide a web-based interface to a database.
2. When setting up a PHP server, where are the PHP scripts stored/located?
3. What code can an external user see when using a PHP script?
4. What restrictions are there for client systems that must use this interface?
5. How does PHP know what type of relational database management system is being
accessed?
6. In the scripts shown in the current chapter, what tables were accessed, and how were
they updated?
7. Give the name of the software component that performs verification of the user’s
name and password.
8. In this application, where are user names and passwords stored and managed?
9. Give the name of the php scripts and how they are used.
10. The first display of WarningMessages.php has titles for each of the five columns
of information from the database. Map each of the title names to columns in the
database.
This application is based on continuous updates from a PERL script that loads new
rows into the associated tables. The display of data from these tables is based on bit
settings at the time each query is run.
11. What are the bit settings for new rows after they are loaded into the database?
12. When and how are these bit settings changed?
Figure 11.1 shows the initial display of WarningRecords in the database when the
browser is first opened.
296 ◾ A Practical Guide to Database Design
13. If the first messages were considered important and the user wants to set the Pending
flag until more research is done, what steps/selections are made by the user?
14. When making this update, how is the row being updated mapped to the information
on the web page?
15. In the WebHeader.php web page, what does the Review/Check Warning Records label
represent? What happens when it is selected?
REFERENCES
1. Valade, J., PHP 5 for Dummies, Indianapolis, IN, Wiley Publishing, 2004.
2. The World Wide Web Consortium (W3C) is an international community that develops open
standards to ensure the long-term growth of the Web. Accessed from http://www.w3.org/.
Appendix A: Warning
Messages
1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//
EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.
dtd">
2 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
lang="en" dir="ltr"> 3 <head>
4 <title>SQL Server Logs</title>
5 <meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1" />
6 <script type="text/javascript">
7 </script>
8 </head>
9 <body>
10 <?php
11 function trace($string) {
12 if ($_POST['debug'] == "On") {
13 echo "$string<br>";
14 }
15 }
16 function startup () {
17 //**********************************************************
18 // $_POST['debug'] = "On";
19 // $_POST['debug'] = "Off";
20
21 if (($_POST['debug']) == "On") {
22 $debug = $_POST['debug'];
23 trace ("***Message update logic***");
24 echo "...DEBUG set to $debug <br>";
25 if (isset($_POST['selection'])) {
26 $var = $_POST['selection'];
27 echo "...Selection is on and set to :$var:<br>";
28 }
29 else {
30 echo "...Selection is Off..<br>";
31 }
297
298 ◾ Appendix A
32 if (isset($_POST['TypeQuery'])) {
33 $var = $_POST['TypeQuery'];
34 echo "...TypeQuery is on and set to :$var:<br>";
35 }
36 else {
37 echo "...TypeQuery is Off..<br>";
38 }
39
40 if (isset($_POST['Query'])) {
41 $var = $_POST['Query'];
42 echo "...Query is on and set to :$var:<br>";
43 }
44 else {
45 echo "...Query is Off..<br>";
46 }
47
48 if (isset($_POST['formAction'])) {
49 $var = $_POST['formAction'];
50 echo "...formAction is on and set to :$var:<br>";
51 }
52 else {
53 echo "...formAction is Off..<br>";
54 }
55 if (isset($_POST['selection'])) {
56 $var = $_POST['selection'];
57 $val = $var[0];
58 echo "...selection is on and set to :$var:$val:<br>";
59 }
60 else {
61 echo "...selection is Off..<br>";
62 }
63
64 if (isset($_POST['RecordKey'])) {
65 $val = $_POST['RecordKey'];
66 echo "...RecordKey is on and set to :$val:<br>";
67 }
68 else {
69 echo "...RecordKey is Off..<br>";
70 }
71 echo "<br>***************************<br>";
72
73 }
74 trace ("Initialization...");
75 if (isset($_POST['selection'])) {
76 $data = $_POST['selection'];
77 echo "<input type=\"hidden\" name=\"selection\" value=$data
/>";
Appendix A ◾ 299
119 unset($_POST['Error']);
120 }
121 if ($_POST['debug'] == "On") {
122 echo "**Action option = $action<br>";
123 }
124 //*************************** option 1 Show all Warning
records
125 if ($action == 1) {
126 trace ("Creating sql string for option 1");
127 $query = "SELECT Recordkey as Selection, Message,
Reviewed, Pending, Resolved from WarningRecords ";
128 $_POST['Query'] = $query;
129 echo "<form action=\"WarningMessages.php\" method =
\"POST\">";
130 echo "<input type=\"hidden\" name=\"debug\"
value=\"$debug\" /Message,
Reviewed, Pending, Resolved from WarningRecords>";
131 }
132
133 //*************************** option 2 Set Reviewed flags
for these rows
134 else if ($action == 2) {
135 trace ("Processing options for action 2");
136 $parms = array();
137 if ((isset($_POST['selection'])) and
($_POST['selection'] !== "") ) {
138 trace ("Reviewed Flag Updates being processed...");
139 $data = $_POST['selection'];
140 $reckey = $data[0];
141 trace ("Using record key:" . $reckey . ":");
142 $debug = $_POST['debug'];
143 if ($_POST['debug'] == "On") {
144 echo "**Reviewed selected set value = $reckey<br>";
145 }
146 foreach ($data as $reckey) {
147 $string1 = 'Update WarningRecords set Reviewed = 1 ';
148 $string2 = 'Where RecordKey = ' . $reckey;
149 $sqlcmd = "$string1 $string2";
150 if ($_POST['debug'] == "On") {
151 echo "Command issued: $sqlcmd <br>";
152 }
153 if ($return = sqlsrv_query($conn, $sqlcmd)) {
154 if ($_POST['debug'] == "On") {
155 echo "REVIEWED flag set...<br>";
156 }
157 }
158 else {
Appendix A ◾ 301
201 }
202 unset ($_POST['action']);
203 unset ($_POST['selection']);
204 unset ($_POST['RecordKey']);
205 $query = "SELECT Message, Reviewed, Pending,
Resolved from WarningRecords ";
206
207 echo "<form action=\"WarningMessages.php\" method =
\"POST\">";
208 echo "<input type=\"hidden\" name=\"Query\"
value=\"$query\" />";
209 echo "<input type=\"hidden\" name=\"debug\"
value=\"$debug\" />";
210 }
211
212 //*************************** option 4 Set Resolved flags
for these rows
213 else if ($action == 4) {
214 trace ("Processing updates for option 4");
215 $parms = array();
216 if ((isset($_POST['selection'])) and ($_
POST['selection'] !== "") ) {
217 trace ("Resolved Flag Updates being processed...");
218 $data = $_POST['selection'];
219 $reckey = $data[0];
220 trace ("Using record key:" . $reckey . ":");
221 $debug = $_POST['debug'];
222 if ($_POST['debug'] == "On") {
223 echo "**Reviewed selected set value = $reckey<br>";
224 }
225 foreach ($data as $reckey) {
226 $string1 = 'Update WarningRecords set Resolved = 1 ';
227 $string2 = 'Where RecordKey = ' . $reckey;
228 $sqlcmd = "$string1 $string2";
229 if ($_POST['debug'] == "On") {
230 echo "Command issued: $sqlcmd <br>";
231 }
232 if ($return = sqlsrv_query($conn, $sqlcmd)) {
233 if ($_POST['debug'] == "On") {
234 echo "Resolved flag set...<br>";
235 }
236 }
237 else {
238 echo "ah crap...<br>";
239 echo "Command issued: $sqlcmd <br>";
240 }
241 }
Appendix A ◾ 303
242 }
243 unset ($_POST['action']);
244 unset ($_POST['selection']);
245 unset ($_POST['RecordKey']);
246 $query = "SELECT Message, Reviewed, Pending, Resolved
from WarningRecords ";
247
248 echo "<form action=\"WarningMessages.php\" method =
\"POST\">";
249 echo "<input type=\"hidden\" name=\"Query\" value=\"$query\"
/>";
250 echo "<input type=\"hidden\" name=\"debug\" value=\"$debug\"
/>";
251 }
252
253 //*************************** option 5 Show all Pending
activity
254 else if ($action == 5) {
255 trace ("Setting query for option 5");
256 $query1 = "SELECT Recordkey as Selection, Message,
Reviewed, Pending, Resolved from WarningRecords ";
257 $query2 = " Where Pending = 1 and Resolved = 0";
258 $query = $query1 . $query2;
259 unset ($_POST['formAction']);
260 unset ($_POST['selection']);
261 $_POST['Query'] = $query;
262 echo "<form action=\"WarningMessages.php\" method =
\"POST\">";
263 echo "<input type=\"hidden\" name=\"debug\" value=\"$debug\"
/>";
264 }
265
266 //*************************** option 6 Show all Resolved
activity
267 else if ($action == 6) {
268 trace ("setting query for option 6");
269
270 $query1 = "SELECT Recordkey as Selection, Message,
Reviewed, Pending, Resolved from WarningRecords ";
271 $query2 = "Where Resolved = 1";
272 $query = $query1 . $query2;
273 unset ($_POST['formAction']);
274 echo "<form action=\"WarningMessages.php\" method =
\"POST\">";
275 echo "<input type=\"hidden\" name=\"debug\" value=\"$debug\"
/>";
276 $_POST['Query'] = $query;
304 ◾ Appendix A
277 }
278 echo "<form action=\"WarningMessages.php\" method =
\"POST\">";
279 echo "<input type=\"hidden\" name=\"debug\"
value=\"$debug\" />";
280 echo "<input type=\"hidden\" name=\"formAction\"
value=\"$action\" />";
281 }
282
283 //**********************************************************
284 //**********************************************************
285 //Display data based on default starting point or user
selections from above
286
287 trace("Starting Query Logic");
288
289 if (isset($_POST['TypeQuery'])){
290 $query = $_POST['TypeQuery'];
291 trace ("query set to $query<br>");
292 }
293 if (isset($_POST['Query'])) {
294 $qval = $_POST['Query'];
295 trace ("Set query string to $qval<br>");
296 }
297
298 if (isset($_POST['formAction'])) {
299 $val = $_POST['formAction'];
300 trace ("Set formAction to $val<br>");
301 }
302
303 if (isset($_POST['debug'])) {
304 $debug = $_POST['debug'];
305 }
306 else {
307 $debug = "**";
308 }
309
310 ?>
311 <form action="<?php echo $self; ?>" method = "post">
312 <label for='formAction[]'>What Action is desired?</
label><br><br>
313 <select name="formAction">
314 <option value="">Actions:</option>
315 <option value="1"> Show All Log Records </option>
316 <option value="2"> Set Reviewed flags for these rows</
option>
Appendix A ◾ 305
309
310 ◾ Appendix B
80 }
81
82 $self = $_SERVER['PHP_SELF'];
83 date_default_timezone_set('America/New_York');
84
85 include "WebHeader.php";
86 //****************** start php logic
87 // DB Variables
88 include "credentials.php"; // Connect to DB
89
90 startup();
91 $parms = array();
92 $options = array( "Scrollable" => SQLSRV_CURSOR_KEYSET );
93
94 //*****************option processing
95 //**********************************************************
96 //**********************************************************
97 if (isset($_POST['formAction']) and $_POST['formAction']
!== "") {
98 trace("Action Processing Selected...");
99 $action = $_POST['formAction'];
100 $data = $_POST['selection'];
101 if (!isset($_POST['RecordKey'])) {
102 $reckey = $data[0];
103 $_POST['RecordKey'] = $reckey;
104 }
105 trace ("Initializing with Record key = $reckey");
106 // echo "<input type=\"hidden\" name=\"RecordKey\"
value=\"$reckey\" />";
107 if ($reckey === "A") {
108 echo "*****Record not selected*****...try
again...<br><br>";
109 unset($_POST['formAction']);
110 $action = 0;
111 $_POST['Error'] = "On";
112 $debug = $_POST['debug'];
113 trace ("***Refresh values...first value = $reckey<br>");
114 echo "<form action=\"ErrorMessages.php\" method =
\"POST\">";
115 echo "<input type=\"hidden\" name=\"debug\"
value=\"$debug\" />";
116 echo "<input type=\"submit\" name=\"link\" value=\"OK\"
/>";
117 }
118 else {
119 unset($_POST['Error']);
120 }
312 ◾ Appendix B
242 }
243 unset ($_POST['action']);
244 unset ($_POST['selection']);
245 unset ($_POST['RecordKey']);
246 $query = "SELECT Message, Reviewed, Pending,
Resolved from ErrorRecords ";
247
248 echo "<form action=\"ErrorMessages.php\" method =
\"POST\">";
249 echo "<input type=\"hidden\" name=\"Query\"
value=\"$query\" />";
250 echo "<input type=\"hidden\" name=\"debug\"
value=\"$debug\" />";
251 }
252
253 //*************************** option 5 Show all Pending
activity
254 else if ($action == 5) {
255 trace ("Setting query for option 5");
256 $query1 = "SELECT Recordkey as Selection, Message,
Reviewed, Pending, Resolved from ErrorRecords ";
257 $query2 = " Where Pending = 1 and Resolved = 0";
258 $query = $query1 . $query2;
259 unset ($_POST['formAction']);
260 unset ($_POST['selection']);
261 $_POST['Query'] = $query;
262 echo "<form action=\"ErrorMessages.php\" method =
\"POST\">";
263 echo "<input type=\"hidden\" name=\"debug\"
value=\"$debug\" />";
264 }
265
266 //*************************** option 6 Show all Resolved
activity
267 else if ($action == 6) {
268 trace ("setting query for option 6");
269
270 $query1 = "SELECT Recordkey as Selection, Message,
Reviewed, Pending, Resolved from ErrorRecords ";
271 $query2 = "Where Resolved = 1";
272 $query = $query1 . $query2;
273 unset ($_POST['formAction']);
274 echo "<form action=\"ErrorMessages.php\" method =
\"POST\">";
275 echo "<input type=\"hidden\" name=\"debug\"
value=\"$debug\" />";
316 ◾ Appendix B
359
360 //Generate a header based on the sql query issued
361 trace ("SQL command:$query<br>");
362 $i = 0;
363 echo "<br />";
364 $return = sqlsrv_prepare($conn, $query);
365 foreach (sqlsrv_field_metadata ($return) as $fieldMetadata) {
366 foreach( $fieldMetadata as $name => $value) {
367 if ($name == 'Name') {
368 $i++;
369 if ($i > 1) {
370 $colnames[$i] = $value;
371 trace ("Name Offset $i: $value");
372 $header = $header . "<th>$colnames[$i]</th>";
373 }
374 }
375 }
376 }
377 $cols = $i;
378 ?>
379 <table class="FloatTitle" onMouseOver="javascript:trac
kTableHighlight(event, '#99cc99');" onMouseOut="javasc
ript:highlightTableRow(0);">
380 <thead>
381 <tr class="header" style='background-
color:#666666;font-size:8pt;color:#fff;border:1px
solid #c3c3c3;padding:3px;'>
382 </tr></thead>
383 <tbody>
384 <?php
385 //***************************************** Display rows
based on sql query 386 //*********************************
********
387 echo "$header";
388 $return = sqlsrv_query ($conn, $query, $parms, $options);
389 if ($return == false) {
390 die ( print_r( sqlsrv_errors(), true));
391 }
392 $i = 0;
393 $numrows = sqlsrv_num_rows($return);
394 if ($numrows == 0) {
395 #print message when no rows found
396 echo "$string<br>";
397 echo "No new records found in the database...<br>";
398 }
399 trace ("Found " . $numrows . " rows in the database");
Appendix B ◾ 319
321
322 ◾ Appendix C
go
@validcnt int,
@insAssignmentNo char(18),
@insCourseID varchar(50),
@insSectionID varchar(50),
@errno int,
@severity int,
@state int,
@errmsg varchar(255)
FK_CONSTRAINT="R_14", FK_COLUMNS="CourseID""SectionID" */
IF
/* %ChildFK(" OR",UPDATE) */
UPDATE(CourseID) OR
UPDATE(SectionID)
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Course_Offering
WHERE
/* %JoinFKPK(inserted,Course_Offering) */
inserted.CourseID = Course_Offering.CourseID and
inserted.SectionID = Course_Offering.SectionID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," AND") */
IF @validcnt + @nullcnt != @numrows
BEGIN
SELECT @errno = 30007,
@errmsg = 'Cannot update Assignment because Course_
Offering does not exist.'
GOTO error
END
END
go
CREATE TRIGGER tD_Course ON Course FOR DELETE AS
/* erwin Builtin Trigger */
/* DELETE trigger on Course */
BEGIN
DECLARE @errno int,
@severity int,
@state int,
@errmsg varchar(255)
/* erwin Builtin Trigger */
/* Course Course_Offering on parent delete restrict */
/* ERWIN_RELATION:CHECKSUM="000536a7", PARENT_OWNER="",
PARENT_TABLE="Course"
CHILD_OWNER="", CHILD_TABLE="Course_Offering"
Appendix C ◾ 333
P2C_VERB_PHRASE="", C2P_VERB_PHRASE="",
FK_CONSTRAINT="R_29", FK_COLUMNS="CourseID" */
IF EXISTS (
SELECT * FROM deleted,Course_Offering
WHERE
/* %JoinFKPK(Course_Offering,deleted," = "," AND") */
Course_Offering.CourseID = deleted.CourseID
)
BEGIN
SELECT @errno = 30001,
@errmsg = 'Cannot delete Course because Course_
Offering exists.'
GOTO error
END
/* erwin Builtin Trigger */
/* Course Curriculum_Course on parent delete restrict */
/* ERWIN_RELATION:CHECKSUM="00000000", PARENT_OWNER="",
PARENT_TABLE="Course"
CHILD_OWNER="", CHILD_TABLE="Curriculum_Course"
P2C_VERB_PHRASE="", C2P_VERB_PHRASE="",
FK_CONSTRAINT="R_26", FK_COLUMNS="CourseID" */
IF EXISTS (
SELECT * FROM deleted,Curriculum_Course
WHERE
/* %JoinFKPK(Curriculum_Course,deleted," = "," AND") */
Curriculum_Course.CourseID = deleted.CourseID
)
BEGIN
SELECT @errno = 30001,
@errmsg = 'Cannot delete Course because Curriculum_
Course exists.'
GOTO error
END
BEGIN
SELECT @errno = 30001,
@errmsg = 'Cannot delete Course because Course_
Prerequisite exists.'
GOTO error
END
go
go
)
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Course because Course_
Offering exists.'
GOTO error
END
END
WHERE
/* %JoinFKPK(Course_Prerequisite,deleted," = "," AND") */
Course_Prerequisite.CourseID = deleted.CourseID
)
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Course because Course_
Prerequisite exists.'
GOTO error
END
END
/* %ParentPK(" OR",UPDATE) */
UPDATE(CourseID)
BEGIN
IF EXISTS (
SELECT * FROM deleted,Course_Prerequisite
WHERE
/* %JoinFKPK(Course_Prerequisite,deleted," = "," AND") */
Course_Prerequisite.CourseID = deleted.CourseID
)
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Course because Course_
Prerequisite exists.'
GOTO error
END
END
go
go
GOTO error
END
END
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Student
WHERE
/* %JoinFKPK(inserted,Student) */
inserted.StudentID = Student.StudentID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," and") */
go
UPDATE(StudentID)
BEGIN
IF EXISTS (
SELECT * FROM deleted,Student_Grade
WHERE
/* %JoinFKPK(Student_Grade,deleted," = "," AND") */
Student_Grade.CourseID = deleted.CourseID AND
Student_Grade.SectionID = deleted.SectionID AND
Student_Grade.StudentID = deleted.StudentID
)
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Course_Enrollment because
Student_Grade exists.'
GOTO error
END
END
WHERE
/* %JoinFKPK(inserted,Student) */
inserted.StudentID = Student.StudentID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," AND") */
go
FK_CONSTRAINT="R_31", FK_COLUMNS="CourseID""SectionID" */
IF EXISTS (
SELECT * FROM deleted,Assignment
WHERE
/* %JoinFKPK(Assignment,deleted," = "," AND") */
Assignment.CourseID = deleted.CourseID AND
Assignment.SectionID = deleted.SectionID
)
BEGIN
SELECT @errno = 30001,
@errmsg = 'Cannot delete Course_Offering because
Assignment exists.'
GOTO error
END
WHERE
/* %JoinFKPK(Course_Enrollment,deleted," = "," AND") */
Course_Enrollment.CourseID = deleted.CourseID AND
Course_Enrollment.SectionID = deleted.SectionID
)
BEGIN
SELECT @errno = 30001,
@errmsg = 'Cannot delete Course_Offering because
Course_Enrollment exists.'
GOTO error
END
go
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Faculty
WHERE
/* %JoinFKPK(inserted,Faculty) */
inserted.FacultyID = Faculty.FacultyID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," and") */
go
IF
/* %ParentPK(" OR",UPDATE) */
UPDATE(CourseID) OR
UPDATE(SectionID)
BEGIN
IF EXISTS (
SELECT * FROM deleted,Course_Enrollment
WHERE
/* %JoinFKPK(Course_Enrollment,deleted," = "," AND") */
Course_Enrollment.CourseID = deleted.CourseID AND
Course_Enrollment.SectionID = deleted.SectionID
)
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Course_Offering because
Course_Enrollment exists.'
GOTO error
END
END
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Course_Offering because
Course_Enrollment exists.'
GOTO error
END
END
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Course
WHERE
/* %JoinFKPK(inserted,Course) */
inserted.CourseID = Course.CourseID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," AND") */
go
@validcnt int,
@errno int,
@severity int,
@state int,
@errmsg varchar(255)
go
CHILD_OWNER="", CHILD_TABLE="Course_Prerequisite"
P2C_VERB_PHRASE="", C2P_VERB_PHRASE="",
FK_CONSTRAINT="R_22", FK_COLUMNS="CourseID" */
IF
/* %ChildFK(" OR",UPDATE) */
UPDATE(CourseID)
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Course
WHERE
/* %JoinFKPK(inserted,Course) */
inserted.CourseID = Course.CourseID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," AND") */
go
go
CHILD_OWNER="", CHILD_TABLE="Curriculum"
P2C_VERB_PHRASE="", C2P_VERB_PHRASE="",
FK_CONSTRAINT="R_24", FK_COLUMNS="DegreeID" */
IF
/* %ChildFK(" OR",UPDATE) */
UPDATE(DegreeID)
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Degree
WHERE
/* %JoinFKPK(inserted,Degree) */
inserted.DegreeID = Degree.DegreeID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," and") */
go
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Curriculum because
Curriculum_Course exists.'
GOTO error
END
END
go
Appendix C ◾ 367
FK_CONSTRAINT="R_25", FK_COLUMNS="CurriculumID" */
IF
/* %ChildFK(" OR",UPDATE) */
UPDATE(CurriculumID)
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Curriculum
WHERE
/* %JoinFKPK(inserted,Curriculum) */
inserted.CurriculumID = Curriculum.CurriculumID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," and") */
go
go
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Curriculum
WHERE
/* %JoinFKPK(inserted,Curriculum) */
inserted.CurriculumID = Curriculum.CurriculumID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," and") */
IF @validcnt + @nullcnt != @numrows
BEGIN
SELECT @errno = 30002,
@errmsg = 'Cannot insert Curriculum_Job_
Classification because Curriculum does not exist.'
GOTO error
END
END
/* erwin Builtin Trigger */
/* Job_Classification Curriculum_Job_Classification on child
insert restrict */
/* ERWIN_RELATION:CHECKSUM="00000000", PARENT_OWNER="",
PARENT_TABLE="Job_Classification"
CHILD_OWNER="", CHILD_TABLE="Curriculum_Job_Classification"
P2C_VERB_PHRASE="", C2P_VERB_PHRASE="",
FK_CONSTRAINT="R_23", FK_COLUMNS="ClassificationID" */
IF
/* %ChildFK(" OR",UPDATE) */
UPDATE(ClassificationID)
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Job_Classification
WHERE
/* %JoinFKPK(inserted,Job_Classification) */
inserted.ClassificationID = Job_Classification.
ClassificationID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," and") */
IF @validcnt + @nullcnt != @numrows
BEGIN
SELECT @errno = 30002,
@errmsg = 'Cannot insert Curriculum_Job_
Classification because Job_Classification does not exist.'
GOTO error
END
END
372 ◾ Appendix C
@validcnt int,
@insClassificationID varchar(50),
@insCurriculumID char(18,50),
@errno int,
@severity int,
@state int,
@errmsg varchar(255)
IF
/* %ChildFK(" OR",UPDATE) */
UPDATE(ClassificationID)
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Job_Classification
WHERE
/* %JoinFKPK(inserted,Job_Classification) */
inserted.ClassificationID = Job_Classification.
ClassificationID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," AND") */
go
error:
RAISERROR (@errmsg, -- Message text.
@severity, -- Severity (0~25).
@state) -- State (0~255).
rollback transaction
END
go
GOTO error
END
END
go
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Degree because Curriculum
exists.'
GOTO error
END
END
go
Appendix C ◾ 379
go
go
CHILD_OWNER="", CHILD_TABLE="Course"
P2C_VERB_PHRASE="", C2P_VERB_PHRASE="",
FK_CONSTRAINT="R_21", FK_COLUMNS="DepartmentID" */
IF
/* %ParentPK(" OR",UPDATE) */
UPDATE(DepartmentID)
BEGIN
IF EXISTS (
SELECT * FROM deleted,Course
WHERE
/* %JoinFKPK(Course,deleted," = "," AND") */
Course.DepartmentID = deleted.DepartmentID
)
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Department because Course
exists.'
GOTO error
END
END
go
386 ◾ Appendix C
go
go
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Job_Classification because
Curriculum_Job_Classification exists.'
GOTO error
END
END
GOTO error
END
go
GOTO error
END
END
go
@errmsg varchar(255)
/* erwin Builtin Trigger */
/* Student Student_Transcript on parent delete restrict */
/* ERWIN_RELATION:CHECKSUM="00033391", PARENT_OWNER="",
PARENT_TABLE="Student"
CHILD_OWNER="", CHILD_TABLE="Student_Transcript"
P2C_VERB_PHRASE="", C2P_VERB_PHRASE="",
FK_CONSTRAINT="R_36", FK_COLUMNS="StudentID" */
IF EXISTS (
SELECT * FROM deleted,Student_Transcript
WHERE
/* %JoinFKPK(Student_Transcript,deleted," = "," AND") */
Student_Transcript.StudentID = deleted.StudentID
)
BEGIN
SELECT @errno = 30001,
@errmsg = 'Cannot delete Student because Student_
Transcript exists.'
GOTO error
END
FK_CONSTRAINT="R_18", FK_COLUMNS="StudentID" */
IF EXISTS (
SELECT * FROM deleted,Course_Enrollment
WHERE
/* %JoinFKPK(Course_Enrollment,deleted," = "," AND") */
Course_Enrollment.StudentID = deleted.StudentID
)
BEGIN
SELECT @errno = 30001,
@errmsg = 'Cannot delete Student because Course_
Enrollment exists.'
GOTO error
END
go
IF
/* %ParentPK(" OR",UPDATE) */
UPDATE(StudentID)
BEGIN
IF EXISTS (
SELECT * FROM deleted,Student_Transcript
WHERE
/* %JoinFKPK(Student_Transcript,deleted," = "," AND") */
Student_Transcript.StudentID = deleted.StudentID
)
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Student because Student_
Transcript exists.'
GOTO error
END
END
CHILD_OWNER="", CHILD_TABLE="Course_Enrollment"
P2C_VERB_PHRASE="", C2P_VERB_PHRASE="",
FK_CONSTRAINT="R_18", FK_COLUMNS="StudentID" */
IF
/* %ParentPK(" OR",UPDATE) */
UPDATE(StudentID)
BEGIN
IF EXISTS (
SELECT * FROM deleted,Course_Enrollment
WHERE
/* %JoinFKPK(Course_Enrollment,deleted," = "," AND") */
Course_Enrollment.StudentID = deleted.StudentID
)
BEGIN
SELECT @errno = 30005,
@errmsg = 'Cannot update Student because Course_
Enrollment exists.'
GOTO error
END
END
go
/* ERWIN_RELATION:CHECKSUM="0001dd00", PARENT_OWNER="",
PARENT_TABLE="Course_Enrollment"
CHILD_OWNER="", CHILD_TABLE="Student_Grade"
P2C_VERB_PHRASE="", C2P_VERB_PHRASE="",
FK_CONSTRAINT="R_27", FK_COLUMNS="CourseID""SectionID"
"StudentID" */
IF
/* %ChildFK(" OR",UPDATE) */
UPDATE(CourseID) OR
UPDATE(SectionID) OR
UPDATE(StudentID)
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Course_Enrollment
WHERE
/* %JoinFKPK(inserted,Course_Enrollment) */
inserted.CourseID = Course_Enrollment.CourseID and
inserted.SectionID = Course_Enrollment.SectionID and
inserted.StudentID = Course_Enrollment.StudentID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," and") */
go
BEGIN
DECLARE @numrows int,
@nullcnt int,
@validcnt int,
@insCourseID varchar(50),
@insSectionID varchar(50),
@insStudentID varchar(50),
@errno int,
@severity int,
@state int,
@errmsg varchar(255)
go
go
IF
/* %ChildFK(" OR",UPDATE) */
UPDATE(StudentID)
BEGIN
SELECT @nullcnt = 0
SELECT @validcnt = count(*)
FROM inserted,Student
WHERE
/* %JoinFKPK(inserted,Student) */
inserted.StudentID = Student.StudentID
/* %NotnullFK(inserted," IS NULL","select @nullcnt = count(*)
from inserted where"," AND") */
go
Appendix D: Search for Terms
405
406 ◾ Appendix D
407
408 ◾ Appendix E
409
410 ◾ Index
L O
Logical data model One-to-many (1:M) relationship, 28
3NF, 42f 3NF data, 35–36
university database, 85, 86f, 87f Oracle, 10–11
using erwin, 96–108 physical data model using, 72
Order entry model, 18
M
Many-to-many (M:M) relationship, 28 P
Microsoft Access, 10, 117–118 Pass-Through queries, 155–156, 231
advantages, 155–156 Perl, 213
database design modifications, 118 applications and uses, 226–227
data import mechanism, 119 “\” character in, 220
features, 117 functions
forms, 129 arrays in, 215–216
Master screen, 146 chomp, 214
for new customer, 141–146 hashes, 217–218
for updating data in tables, 129–141 if/then/else, 215
as GUI, 232 index, 214
advantages, 232–233 int, 218
capabilities, 232 length, 214
disadvantages, 233 modulus operator, 218–219
linking to SQL Server/Oracle database, 155 “.” operator, 218
in office environment, 231 sleep, 215
advantages and disadvantages, 232 split, 214
capabilities, 231 substitute operator, 215
Pass-Through queries, 155–156, 231 substr, 214
PHP and, 275 system, 215, 226
physical data model using, 47 while loop, 214
indexes creation, 55–56 key matching features, 221–222
Referential Integrity constraints in, 51–54 loading data into tables, 227–229
table creation, 47–50 versus Python, 219
query, 119–125 scripts to monitoring
results, 125–129 SQL Server logs, 222–224, 224f
using SQL commands, 125 Windows logs, 225, 226f
Query wizard, 119, 125, 129, 155 to search file, 215f, 216–217, 217f, 405–406
reports, 146–153 in Oracle database, 216–217
tables in, 118 search patterns, 214
loading data into, 118–119 Warning and Error log messages, 278
for team of users, 153–155 in Windows versus Unix, 219–221
Microsoft’s Visual Studio, 275 PHP, 234, 275–276
M:M (many-to-many) relationship, 28 advantages, 234
Modulus operator (%), 218 configuration testing, 277
Mutually exclusive data, entity/attribute error, disadvantages, 235
21, 23 features, 276
MySQL, 3–4 format file command, 278
Handler Mapping, 277
IIS and, 276–277
N
installation, 277
Normalization process, 17, 24, 75. See also Data and Java, 275
normalization and Microsoft Access, 275
412 ◾ Index
PHP (Continued) S
user interface and, 275
Second normal form (2NF), 29, 31
web-based application, 276
entity/attribute list, 32f
error messages, 294–295, 309–319
Sleep function, Perl, 215
system components, 276–279
Split function, Perl, 214
warning messages, 292–294, 297–307
SQL. See Structured Query Language (SQL)
web-based interface, 279
SQL Server Agent, 208–210
home page user options, 283
SQL Server Management Studio, 56, 68, 157, 170,
review/check error records, 292
184, 204, 210
review/check warning records, 283–291
SQLSvrLogs database, 171–174, 207
user authentication, 281–282
Stored Procedure, 207–208
user logon options, 280–281
Structured Query Language (SQL)
Physical data model, 41, 46f
language, 5–6
access paths, 42–44
for query creation, 125
indexes, 45–46
Structured Query Language (SQL) Server, 2, 10, 157
table
advantages, 157
creation, 46
authorized users, 186–190
key and column data types, 44–45
Change Management, 157–158
university database, 88–91
databases
Referential Integrity constraints, 241–244
backup/recovery services, 190–194
tables creation, 237–240
creation, 170
using erwin, 109–113
installation
using Microsoft Access, 47
on laptop, 159
indexes creation, 55–56
preinstallation considerations, 158–159
Referential Integrity constraints in, 51–54
prerequisites, 159
table creation, 47–50
on server, 159
using SQL Server
software, 160–169
database creation, 56–59
loading data into tables, 195–201
indexes in, 69–72
logs, 222
Referential Integrity constraints in, 62–68
check, 407–408
table creation, 60–62
Perl scripts to monitoring, 222–224, 224f
Production environment, complex systems, 95
manual queries and edits, 204–207
Python, 219
Microsoft Access to, 155
physical data model using
Q database creation, 56–59
Query wizard, 119, 125, 129, 155 indexes in, 69–72
Referential Integrity constraints in, 62–68
table creation, 60–62
R
SQL Server Agent, 208–210
Redundant array of inexpensive disks (RAID), 12, SQLSvrLogs, 171–174, 207
159 Stored Procedure, 207–208
Redundant information, entity/attribute error, university database, 114–115, 174
20–21, 23 indexes, 177–180
Referential Integrity, 6–7 Referential Integrity constraints, 180–184
constraints, 3, 51–54, 62–68, 180–184, 241–244 table creation, 174–177
Relational database management system (RDBMS), user roles, 184–186
3, 117. See also Microsoft Access; View creation, 202–204
Structured Query Language (SQL) Server Substitute operator, Perl, 215
Reports, Microsoft Access, 146–153 Substr function, Perl, 214
Reverse engineering, 93–94 Synonyms, entity/attribute errors, 20–21, 23
Index ◾ 413