Chapter 11 Database
Chapter 11 Database
Data
Data refers to raw facts, figures, or information that can be in various forms, such as
numbers, text, images, or multimedia. Data is the foundation of information, and it
becomes meaningful and useful when processed and organized. There are two primary
types of data:
• Structured Data: Highly organized and formatted data that fits into predefined
categories, often found in databases and spreadsheets. Examples include tables of
numbers, dates, or categorical information.
• Unstructured Data: Information that lacks a specific format and is not easily
organized, such as text documents, emails, images, and videos. Unstructured data
requires advanced processing techniques, like natural language processing or image
recognition, to extract meaningful insights.
• Logical view (meaning, content and context of data) and physical view(actual format
and location of data)
Data organization
Data organization
• Character: A character is the most basic logical data element. It is a single letter,
number, or special character, such as a punctuation mark, or a symbol, such as $.
• Field: The next higher level is a field, or group of related characters. In our
example, Brown is in the data field for the Last Name of an employee. It consists
of the individual letters (characters) that make up the last name. A data field
represents an attribute (description or characteristic) of some entity (person,
place, thing, or object). For example, an employee is an entity with many
attributes, including his or her last name.
• Record: A record is a collection of related fields. A record represents a collection
of attributes that describe an entity. In our example, the payroll record for an
employee consists of the data fields describing the attributes for one employee.
These attributes are First Name, Last Name, Employee ID, and Salary.
Data organization
• Table: A table is a collection of related records. For example, the
Payroll Table would include payroll information (records) for the
employees (entities).
• Database: A database is an integrated collection of logically related
tables. For Page 268 example, the Personnel Database would include
all related employee tables, including the Payroll Table and the
Benefits Table.
Key feild
Each record in a table has at least one distinctive field, called the key
field. Also known as the primary key, this field uniquely identifies the
record. Tables can be related or connected to other tables by common
key fields.
For most employee databases, a key field is an employee identification
number. Key fields in different tables can be used to integrate the data
in a database. For example, in the Personnel Database, both the Payroll
and the Benefits tables include the field Employee ID. Data from the
two tables could be related by combining all records with the same key
field (Employée ID).
Batch versus Real-Time Processing
• Traditionally, data is processed in one of two ways. These are batch processing
(later), and real-time processing(now). These two methods have been used to
handle common record-keeping activities such as payroll and sales orders.
• Batch processing: In batch processing, data is collected over several hours,
days, or even weeks. It is then processed all at once as a "batch." If you have a
credit card, your bill probably reflects batch processing. That is, during the
month, you buy things and charge them to your credit card. Each time you
charge something, an electronic copy of the transaction is sent to the credit
card company. At some point in the month, the company's data processing
department puts all those transactions (and those of many other customers)
together and processes them at one time. The company then sends you a
single bill totaling the amount you owe.
Batch versus Real-Time Processing
Batch versus Real-Time Processing
• Real-time processing: Real-time processing, also known as online
processing, occurs when data is processed at the same time the
transaction occurs.
• For example, whenever you request funds at an ATM, real-time
processing occurs. After you have provided account information and
requested a specific withdrawal, the bank's computer verifies that you
have sufficient funds in your account. If you do, then the funds are
dispensed to you, and the bank immediately updates the balance of
your account.
Batch versus Real-Time Processing
Databases
• Many organizations have multiple files on the same subject or person. For example,
a customer's name and address could appear in different files within the sales
department, billing department, and credit department. This is called data
redundancy. If the customer moves, then the address in each file must be updated.
If one or more files are overlooked, problems will likely result.
• For example, a product ordered might be sent to the new address, but the bill might
be sent to the old address. This situation results from a lack of data integrity.
Moreover, data spread around in different files is not as useful. The marketing
department, for instance, might want to offer special promotions to customers who
order large quantities of merchandise. To identify these customers, the marketing
department would need to obtain permission and access to files in the billing
department. It would be much more efficient if all data were in a common
database. A database can make the needed information available.
Need for Databases
For an organization, there are many advantages to having databases:
• Sharing: In organizations, information from one department can be readily shared with others.
Billing could let marketing know which customers ordered large quantities of merchandise.
• Security: Users are given passwords or access only to the kind of information they need. Thus,
the payroll department may have access to employees' pay rates, but other departments would
not.
• Less data redundancy: Without a common database, individual departments have to create and
maintain their own data, and data redundancy results. For example, an employee's home
address would likely appear in several files. Redundant data causes inefficient use of storage
space and data maintenance problems.
• Data integrity: When there are multiple sources of data, each source may have variations. A
customer's address may be listed as "Main Street" in one system and "Main St." in another. With
discrepancies like these, it is probable that the customer would be treated as two different
people.
Database Management
• In order to create, modify, and gain access to a database, special software is required. This
software is called a database management system, which is commonly abbreviated DBMS.
Some DBMSs, such as Microsoft Access, are designed specifically for personal computers.
Other DBMSs are designed for specialized database servers. DBMS software is made up of
five parts or subsystems: DBMS engine, data definition, data manipulation, application
generation, and data administration.
• The DBMS engine provides a bridge between the logical view of the data and the physical
view of the data. When users request data (logical perspective), the DBMS engine handles
the details of actually locating the data (physical perspective).
• The data definition subsystem defines the logical structure of the database by using a data
dictionary or schema. This dictionary contains a description of the structure of data in the
database. For a particular item of data, it defines the names used for a particular field. It
defines the type of data for each field (text, numeric, time, graphic, audio, and video). An
example of an Access data dictionary form is presented in
Database management
Database management
• The data manipulation subsystem provides tools for maintaining and
analyzing data. Maintaining data is known as data maintenance. It
involves adding new data, deleting old data, and editing existing data.
Analysis tools support viewing all or selected parts of the data,
querying the database, and generating reports. Specific tools include
query-by-example and a specialized programming language called
structured query language (SQL).
• The application generation subsystem provides tools to create data
entry forms and specialized programming languages that interface or
work with common and widely used programming languages such as
C++.
Database management
• The data administration subsystem helps to manage the overall
database, including maintaining security, providing disaster recovery
support, and monitoring the overall performance of database
operations. Larger organizations typically employ highly trained
computer specialists, called database administrators (DBAs), to
interact with the data administration subsystem. Additional duties of
database administrators include determining processing rights or
determining which people have access to what kinds of data in the
database.
Database structure
DBMS programs are designed to work with data that is logically structured or arranged in a particular way.
This arrangement is known as the database model. These models define rules and standards for all the data
in a database. For example, Microsoft Access is designed to work with databases using the relational data
model.
Five common database models are hierarchical, network, relational, multidimensional, and object-oriented.
• Hierarchical Database
At one time, nearly every DBMS designed for mainframes used the hierarchical data model.In a hierarchical
database, fields or records are structured in nodes. Nodes are points connected like the branches of an
upside-down tree. Each entry has one parent node, although a parent may have several child nodes. This is
sometimes described as a one-to-many relationship. To find a particular field, you have to start at the top
with a parent and trace down the tree to a child.
An example of a hierarchical database is a system to organize music files. The parent node is the music
library for a particular user. This parent has four children, labeled "artist." Coldplay, one of the children, has
three children of its own. They are labeled "album." The Greatest Hits album has three children, labeled
"song."
Database structure(cont)