Chapter Two
Chapter Two
Introduction
For financial and/or legal reasons, organizations collect and store vast amounts of data about employees, customers, finances, vendors, inventory, competitors, and markets, to name only a few. The amount of data needed is important because people generally make better decisions if they have more data available to them.
Honelign,2012 1
Data cannot be understood until it is analyzed. As the manager begins to process and analyze the data, it eventually begins to tell a story. A computer cannot process data unless it is organized in special ways; into characters, fields, records, files and databases.
Honelign,2012
Character
A character is the most basic element of data that can be observed and manipulated. e.g., $, #, and ?
Attribute /Field
A field contains an item of data; that is, a character, or group of characters that are related. For instance, a grouping of related text characters such as "John Smith" makes up a name in the name field.
Honelign,2012 3
An attribute is a descriptive property of an entity. synonyms include, element, property, and field. Generally there are four types of fields:
primary key, secondary, key foreign key and descriptive /non key fields.
Honelign,2012 4
A primary key is the attribute, or combination of attributes, that uniquely identifies a specific row in a table. Secondary key-is an alternative identifier for a data base. It may identify either a single record (as with primary key) or a subject of records. A foreign key is an attribute in a table that is a primary key in another table. Foreign keys are used to link tables.
Honelign,2012 5
Foreign keys are pointers to the records of a d/t file in a data base. Foreign keys in one file requires the existence of the corresponding primary key in other table or file-otherwise it dos not point to any thing. Descriptive field-is any other non key fields that stores business data.
Honelign,2012
Record
A record is composed of a group of related fields. A record contains a collection of attributes related to an entity such as a person or product. E.g. A payroll record would contain the name, address, social security number, and title of each employee.
Honelign,2012 7
Database File
A database file is a collection of related records. A database file is sometimes called a table. A file may be composed of a complete list of individuals on a mailing list, including their addresses and telephone numbers. Files are frequently categorized by the purpose or application for which they are intended. common examples include
mailing lists, quality control files, inventory files, or document files.
Honelign,2012 8
Database
A database is composed of related files that
are consolidated, organized and stored together. One collection of related files might pertain to employee information. Another collection of related files might contain sports statistics.
Honelign,2012
Honelign,2012
10
In a flat-file approach, each user group owns its data and it is not usually available to others, even within the organization. Thus, the same data element may be represented in all user files. This is called data redundancy.
Honelign,2012 11
User 2 Transactions
Program 2
User 3 Transactions
X,B,Y
Program 3
L,B,M
Honelign,2012
12
Data Redundancy & Flat-File Problems Data Storage - creates excessive storage costs of paper documents and/or magnetic form. Data Updating - any changes or additions must be performed multiple times. Currency of Information - potential problem of failing to update all affected files. Task-Data Dependency - users inability to obtain additional information as his or her needs change.
Honelign,2012 13
Honelign,2012
14
Honelign,2012
15
Because it is so different from the file-oriented approach, the database approach requires training users
may be inertia or resistance
Honelign,2012
16
Honelign,2012
17
2.3.1. Users access data in two ways. Via user programs that send data access to requests to DBMS, and Through direct query, which requires no formal user program.
Honelign,2012
18
2.3.2. DBMS
The purpose of the DBMS is to provide controlled access to the database. The DBMS is a special software system programmed to know which data elements each user is authorized to access and deny unauthorized requests of data.
Honelign,2012
19
conceptual view- logical and abstract representation of data base. There is only one conceptual view of the database. user view - defines how a particular user sees the portion of the database each user views. There are Honelign,2012 21 many user views of a data base.
Honelign,2012
22
Honelign,2012
23
Query Language
The query capability permits end users and professional programmers to access data in the database without the need for conventional programs. ANSIs Structured Query Language (SQL) is a fourth-generation language that has emerged as the standard query language. SQL is a nonprocedural language with many commands that allow users to input, retrieve, and modify data easily. The SELECT command is a powerful tool for retrieving data.
Honelign,2012
24
SQL is an efficient data processing tool, requires far less training in computer concepts and fewer programming skills than many languages. This feature places ad hoc reporting and data processing capability in the hands of the user/manager. By reducing reliance on professional programmers, managers are better able to deal with problems that pop up. The example in the next figure illustrates the use of the SELECT command to produce a user report from a database called Inventory.
Honelign,2012 25
Honelign,2012
26
Honelign,2012
28
Organizational Interactions of the DBA Of particular importance is the relationship among the DBA, the end users, and the systems professionals of the organization. As information needs arise, users send formal requests for computer applications to the systems professionals (programmers) of the organization. The requests are handled through formal systems development procedures, which produce the programmed applications.
Honelign,2012
29
Honelign,2012
30
The user requests also go to the DBA, who evaluates these to determine the users database needs. Once this is established, the DBA grants the user access authority by programming the users view (subschema). This relationship is shown as the lines between the user and the DBA and between the DBA and DDL module in the DBMS. By keeping access authority separate from systems development (application programming), the organization is better able to control and protect the database. Intentional and unintentional attempts at unauthorized access are more likely to be discovered when these two groups work independently. Honelign,2012 31
Honelign,2012
32
Honelign,2012
33
Data structures
Data structures are the bricks and mortar of the data base. It allows records to be located, stored,and retrieved and enables movement from on record to another.
In general data structures must support the following file processing operations :
Retrieve a record from the file based on its pk value Insert a record in to a file Update a record in the file Read a complete file of records Find the next record in a file Scan a file for records with common secondary keys Delete a record from a file
Honelign,2012 34
Criteria for data structure selection No single structure is best for all processing tasks/operations. Therefore, the following criteria are used to select data structure
Rapid file access and data retrieval Efficient use of disc storage device High throughput for transaction processing Protection from data loss Ease of recovery from system failure Accommodation of file growth
Honelign,2012
36
Honelign,2012
37
i. Sequential structure
Also called sequential access method. Records in the file lie in contiguous storage spaces in a specified sequence arranged by their primary key. Sequential files are simple and easy to process It does not permit accessing a record directly. Thus, it is efficient for only operations
Read a complete file of records Find the next record in a file
Honelign,2012
38
Honelign,2012
39
Honelign,2012
40
The physical organization of the index itself may be either sequential (by key value) or random. Advantages of indexed random files is its efficiency in the ff operations of single record processing
Retrieve a record from the file based on its pk value Insert a record in to a file Update a record in the file Scan a file for records with common secondary keys and, Efficient use of disk storage. Disadvantage Not efficient for operations that involve processing a large portion of a file.
Honelign,2012 41
An ISAM file has three physical components: the indexes, the prime data storage area, and the overflow
area.
ISAM is popular option for large and stable files that need both direct access & batch processing but not for highly volatile files.
Honelign,2012
44
Honelign,2012
45
Hashing eliminates the need for a separate index. It uses a random file organization since the process of calculating residuals and converting them into storage locations produces widely dispersed record addresses
Honelign,2012 46
Disadvantages:
It does not used storage space efficiently as some disk locations will never be selected by algorithm. Collision(the reverse of the first) that slows down access speed.(see the book p.421)
Honelign,2012
47
Honelign,2012
48
Types of Pointers
Three type of pointers
49
Honelign,2012
50
Honelign,2012
51
Honelign,2012
52
Data elements Attribute- are the data elements that define an entity.
For example, an Employee entity may be defined by the following partial set of attributes: Name, Address, Job Skill, Years of Service, and Hourly Rate of Pay.
Record type-a group of data elements that logically pertain to an entity. Record associations- the relationship that exists among record types.
Honelign,2012 53
Record associations
Record types exist in relation to other record types. This is called an association. There are three types of record associations:
One-to-One eg. Employee record -to - year to date earning One-to-Many eg. Customer record to-sales order record Many-to-Many(two way relationship) eg. inventory record to- vendor record
Honelign,2012 54
A file can be both the child in one set and the parent in another
set but this is impossible within a set. Files at the same level with the same parent are called siblings. This structure is also called a tree structure. The file at the most aggregated level in the tree is the root
The only way to access data at lower levels in the tree is from the root and via the pointers down the navigational path to the desired records. i.e. it allows only one path.
Honelign,2012
56
Honelign,2012
57
(limitation) .
Honelign,2012
58
Honelign,2012
60
Honelign,2012
62
Honelign,2012
63
Properly designed tables possess the following four characteristics: 1.All occurrences at the intersection of a row and a column are a single value. No multiple value or repeating group is allowed 2. All attribute values in any column must be of the same class. 3. Each column in a given table must be uniquely named. However, different tables may contain columns with the same name. 4. Each row in the table must be unique in at least one attribute. This attribute is the PK
Honelign,2012
64
Implicit linkage(Absence of explicit pointers) Data presented as the collection of independent tables (absence of a tree or network structure) Relations are formed by an attribute common to both tables in the relation.(absence of pointers or explicit links)
Honelign,2012
65
Various items of interest (customers, inventory, sales) are stored in separate tables. Space is used efficiently. Very flexible. Users can form ad hoc relationships.
Honelign,2012
66
Honelign,2012
67
Honelign,2012
68
Importance
Table that have not been normalized are associated with three types of problems called anomalies : update anomaly, insertion anomaly and deletion anomaly. The importance of data normalization is making the data base tables free from these anomalies.
Honelign,2012
69
Honelign,2012
70
Honelign,2012
71
Normalization process
Step 1. Identify and remove any repeating groups. Repeating groups are multiple data values at the intersection of rows and columns. When this is done, the table is in 1NF.
Honelign,2012
73
98653
98653 98653
Intr Acct
Calc Intr Mgt
Ray
Jones Buel
9-11
1 -3 4-5
442
323 463
8-4545
8-2345 8-3436
B
B C
Table 1. Unnormalized data base of Student Enrollment Stdnt# Stdnt Majar Course Crse desc Instr Office hrs Table 3: Course Grade (1 NF) Crse Stdnt# Course desc Instr Table 2: Student (3NF) Stdnt# Stdnt 86432 Sethi Majar Acctg 86432 Acct 315 Fin Acct 86433 Acct 324 Mgt Acct 86434 Math 21 Calc Ray Paul Jones Buel Patch Ray Jones Buel
Loc
Tel no Grade
Loc
Tel no Grade A A B C B B B C
442 8-4545 448 8-8945 323 8-2345 463 8-3436 342 8-2378 442 8-4545 323 8-2345 463 8-3436
86789 Archer
98653 Mills
Mgt
Acctg
Instr
Office hrs
Loc
Tel no
Grade
Table 4:Student Grade (3NF) Stdnt# Course Grade 86432 Acct 315 A 86433 Acct 324 A 86434 Math 21 B 86789 Mgt 1 C 86789 Hist 1 B 98653 Acct 1 B 98653 Math 21 B 98653 Mgt 1 C
Table 5: Course Instructor (2NF) Course Crse desc Instr Acct 315 Fin Acct Ray Acct 324 Mgt Acct Paul Math 21 Calc Jones Mgt 1 Intr Mgt Buel Hist 1 Us Hist Patch Acct 1 Intr Acct Ray Math 21 Calc Jones Mgt 1 Intr Mgt Buel
Office hrs
Loc
Tel no
Table 6: Course (3NF) Course Crse desc Instr Acct 315 Fin Acct Ray Acct 324 Mgt Acct Paul Math 21 Calc Jones Mgt 1 Intr Mgt Buel Hist 1 Us Hist Patch Acct 1 Intr Acct Ray Math 21 Calc Jones Mgt 1 Intr Mgt Buel
Table 7: Instructor (3NF) Instr Ray Paul Jones Buel Patch Office hrs 9 -11 8 -11 1 -3 4-5 9-11 Loc 442 448 323 463 342 Tel no 8-4545 8-8945 8-2345 8-3436 8-2378
Table 2: Student (3NF) Stdnt# Stdnt Majar 86432 Sethi Acctg 86789 Archer Mgt 98653 Mills Acctg
Table 4:Student Grade (3NF) Stdnt# Course Grade A 86432 Acct 315 A 86433 Acct 324 B 86434 Math 21 C 86789 Mgt 1 B 86789 Hist 1 B 98653 Acct 1 B 98653 Math 21 C 98653 Mgt 1
Table 6: Course (3NF) Course Acct 315 Acct 324 Math 21 Mgt 1 Hist 1 Acct 1 Crse desc Fin Acct Mgt Acct Calc Intr Mgt Us Hist Intr Acct Instr Ray Paul Jones Buel Patch Ray Table 7: Instructor (3NF) Instr Ray Paul Jones Buel Patch Office hrs 9 -11 8 -11 1 -3 4-5 9-11 Loc 442 448 323 463 342 Tel no 8-8945 8-2345 8-3436 8-2378 8-4545
Honelign,2012
79
Honelign,2012
80