Ict Ii Topic One: Introduction To Data Management: by Bwiino Keefa Email: Mubs Jinja Campus Dept. Marketing & Management
Ict Ii Topic One: Introduction To Data Management: by Bwiino Keefa Email: Mubs Jinja Campus Dept. Marketing & Management
b
What Are the Benefits of Good Data
Management?
b
Data Management
Data Management is a broad field of study,
but essentially is the process of managing
data as a resource that is valuable to an
organization or business.
Data management can also be the
development and execution of architectures,
policies, practices and procedures in order to
manage the information lifecycle needs of
an enterprise in an effective manner.
b
Areas of Data Management
Data Modeling- Is first creating a structure for the
data that you collect and use and then organizing
this data in a way that is easily accessible and
efficient to store and pull the data for reports and
analysis.
Data warehousing - is storing data effectively so
that it can be accessed and used efficiently in future.
Data Movement - is the ability to move data from
one place to another. For instance, data needs to be
moved from where it is collected to a database and
then to an end user.
b
Areas of Data Management cont
Database Administration - is extremely important in
managing data. Every organization or enterprise needs
database administrators that are responsible for the database
environment.
Data mining - is a process in which large amounts of data
are sifted through to show trends, relationships, and patterns.
Data mining is a crucial component to data management
because it exposes interesting information about the data
being collected. It is important to note that data is primarily
collected so it can be used to find these patterns,
relationships and trends that can help a business grow or
create profit.
b
File Organization Terms and Concepts
Key field
NAME STUDENT No COURSE GRADE
b
Accessing Records from Computer Files
b
Advantages of File Based
Approach
Backup:
Itis possible to take faster and automatic back-
up of database stored in files of computer-based
systems.
computer systems provide functionalities to
serve this purpose.it is also possible to develop
specific application program for this purpose.
Compactness:
It is possible to store data compactly.
b
Advantages of File Based
Approach
Data Retrieval:
Computer-based systems provide enhanced data
retrieval techniques to retrieve data stored in
files in easy and efficient way.
Editing:
It is easy to edit any information stored in
computers in form of files.
Specific application programs or editing
software can be used for this purpose.
b
Advantages of File Based
Approach
Remote Access:
In computer-based systems,it is possible to
access data remotely.
so,to access data it is not necessary for a user to
remain present at location where these data are
kept.
Sharing:
Data stored in files of computer-based systems
ca be shared among multiple users at a same
time.
b
Problems of File Based Approach
Data Redundancy:
Itis possible that the same information may be
duplicated in different files. This leads to data
redundancy which results into memory wastage.
Data Inconsistency:
Because of data redundancy, it is possible that
data may not be in consistent state due to
multiple storage of the same data
b
Problems of File Based Approach
Difficulty in Accessing Data:
Accessing data is not convenient and efficient in
file processing system.
Limited Data Sharing:
Data are scattered in various files. Also different
files may have different formats and these files
may be stored in different folders may be of
different departments.
So, due to this data isolation, it is difficult to
share data among different applications.
b
Problems of File Based Approach
Integrity Problems:
Data integrity means that the data contained in
the database is both correct and consistent. For
this purpose the data stored in database must
satisfy correct and constraints.
Atomicity Problems:
Any operation on database must be atomic. This
means, it must happen in its entirely or not at
all.
b
Problems of File Based Approach
Concurrent Access Anomalies:
Multiple users are allowed to access data
simultaneously. This is for the sake of better
performance and faster response.
Security Problems:
Database should be accessible to users in
limited way.
Each user should be allowed to access data
concerning his requirements only
b
Problems of File Based Approach
Data dependence - Using file-based system, the physical
structure and storage of the data files and records are defined
in the application program code. This characteristic is known
as program-data dependence. Making changes to an existing
structure are rather difficult and will lead to a modification of
program. Such maintenance activities are time-consuming
and subject to error.
Incompatible file format - The structures of the file are
dependent on the application programming language.
However file structure provided in one programming
language such as direct file, indexed-sequential file which is
available in COBOL programming, may be different from the
structure generated by other programming
b language such as
C.
Problems of File Based Approach
Lack of flexibility refers to the fact that it is very difficult to
create new reports from the data when needed. Ad hoc reports
are impossible; a new report could require several weeks of
work by more than one programmer and the creation of
intermediate files to combine data from disparate files.
b
Understanding Terms
b
Database Approach to data
management
b
Basic Database Definitions
Data-Item (field):
It is a character or group of characters that has a
specific meaning. For Example, cid, cname from
customer table
A record:
Itis a collection of logically related fields. And
we also say that record consists of values for
each field.
b
Basic Database Definitions
A file:
It is a collection of related records arranged in a specific
sequence.
Metadata:
Set of data that describes and gives information about
another data. In other words, data about data is called
metadata.
System Catalog:
The system catalog is a collection of tables and views
that contain important information about a database. A
system catalog is available for each database.
b
Basic Database Definitions
Data dictionary:
Data dictionary is a file that contains metadata that is usually a part of
the system catalog. It has the following for components: Entities,
Attributes, Relationships and Keys
Entity
– A generalized class of people, places, or things (objects) for which
data are collected, stored, and maintained
– E.g., Customer, Employee
Attribute
– A characteristic of an entity; something the entity is identified by
– E.g., Customer name, Employee name
b
Database Keys
Keys
– A field or set of fields in a record that is used to identify the
record
– E.g, A field or set of fields that uniquely identifies the record
Primary Key
– This is a the first key that uniquely identifies a record e.g regno,
employee_ID
Candidate Key
– This any other key other than the primary key that we can use
to identify a record e.g NIN, NSSFN,TIN, Passport Number
Foreign Key
– A field that enforces referential integrity between two tables in
the database b
Database Keys
Compound Key
– This is when more than one field is combined to
form a primary key eg. Studentno & courseID
Composite Key
– Composite key is similar to compound key, but the
columns which are part of composite keys
are always keys in that table.
Surrogate Key
– Surrogate key is a kind of primary key, but it is not
defined by the designer. It is a system generated
random number, which uniquely identifies the entity
in the system and not available for the user.
b
Database Management System (DBMS)
b
The Contemporary Database Environment
b
Functional Components of DBMS
Data Definition Language (DDL) - It defines each
element as it appears in the database. The DDL is the
formal language programmers use to specify the content
and structure of the database.
Data Manipulation Language (DML) - It is a set of
procedural commands that enable programmers to append,
modify, update, and retrieve data. The DML uses simple
verbs like sort, delete, insert, select, display
Query Language
b
Functional Components of DBMS
A query language - It enables the user to
make queries from the database. It is a
standard data manipulation language for
relational database management systems.
E.g SQL
Report Generators - It enables generation of
reports from a database. The programs
enable reports be presented using pictures,
graphics, maps etc.
b
Functional Components of DBMS
b
Advantages of Database Approach
Control of data redundancy -The database approach
attempts to eliminate the redundancy by integrating the file.
Although the database approach does not eliminate
redundancy entirely, it controls the amount of redundancy
inherent in the database.
Data consistency - By eliminating or controlling
redundancy, the database approach reduces the risk of
inconsistencies occurring. It ensures all copies of the idea
are kept consistent.
More information from the same amount of data - With
the integration of the operated data in the database
approach, it may be possible to derive additional
information for the same data.b
Advantages of Database Approach
Sharing of data - Database belongs to the entire
organization and can be shared by all authorized users.
Improved data integrity - Database integrity provides the
validity and consistency of stored data. Integrity is usually
expressed in terms of constraints, which are consistency
rules that the database is not permitted to violate.
Improved security - Database approach provides a
protection of the data from the unauthorized users. It may
take the term of user names and passwords to identify user
type and their access right in the operation including
retrieval, insertion, updating and deletion.
b
Advantages of Database Approach
Enforcement of standards -The integration of the
database enforces the necessary standards including data
formats, naming conventions, documentation standards,
update procedures and access rules.
Increased concurrency - Database can manage concurrent
data access effectively. It ensures no interference between
users that would not result any loss of information nor loss
of integrity.
Improved backing and recovery services - Modern
database management system provides facilities to
minimize the amount of processing that can be lost
following a failure by using the transaction approach.
b
Disadvantages of Database Approach
Complexity - Database management system is an extremely complex
piece of software. All parties must be familiar with its functionality
and take full advantage of it. Therefore, training for the
administrators, designers and users is required.
Size - The database management system consumes a substantial
amount of main memory as well as a large number amount of disk
space in order to make it run efficiently.
Cost of DBMS - A multi-user database management system may be
very expensive. Even after the installation, there is a high recurrent
annual maintenance cost on the software.
Cost of conversion - When moving from a file-base system to a
database system, the company is required to have additional expenses
on hardware acquisition and training cost.
b
Disadvantages of Database Approach
Performance - As the database approach is to cater for
many applications rather than exclusively for a particular
one, some applications may not run as fast as before.
Higher impact of a failure - The database approach
increases the vulnerability of the system due to the
centralization. As all users and applications reply on the
database availability, the failure of any component can
bring operations to a halt and affect the services to the
customer seriously.
b
Database Principles
Data Independence-This is used to describe the separation of
data or data handling from the functional processing of the data
and the programs that use the data.
Data Integrity - This is where data is held in a single, integrated
database
Data Redundancy/Data Duplication - This describes the case
where a particular data element is individually kept at several
places (records, files, etc) in the database.
Data Security - This is the ability of a database system to
preserve and protect the data which it holds.
b
Database Models
Collection of logical constructs used to
represent data structure and relationships
within the database
Conceptual models: logical nature of data
representation
Implementation models: emphasis on how the
data are represented in the database
b
Database Models (con’t.)
Relationships in Conceptual Models
One-to-one(1:1)
One-to-many (1:M)
Many-to-many (M:N)
b
Hierarchical Database model
Hierarchical DBMS:
b
A Hierarchical Database for a Human Resources System
b
Network Data Model
Project 1 Project 2
b
Relational Data Model
b
Data Table 1: Project Table Data Table 2: Department Table
Project Number Description Dept. Number Dept. Number Dept. Name Manager SSN
b
Entity Relationship Database
Model
Complements the relational data model
concepts
Represented in an entity relationship
diagram (ERD)
Based on entities, attributes, and
relationships
b
Database Types
• Flat file
– Has no relationship between its records
– Used to store and manipulate a single table or file
– stores each record as a line of text, and uses commas, tabs, or other indicators within the
line to separate the items
• Comma-separated values (CSV)
– File organizer
• Goes beyond the capabilities of a flat file to store and/or retrieve data
• Single User
– Only one person can use the database at any time (e.g. Microsoft Outlook and Quicken
used to store and manipulate personal data)
• Multiuser
– Networked computer systems need multiuser DBMSs
– Allow several people in an organization access the data and to see each other’s changes
General-purpose database
Can be used for a large number of applications
Special-purpose database
Designed for a limited number of applications
or to serve a specific need
Front-end application
One that directly interacts with people or users
Back-end application
Interacts with other programs or applications
System designers are increasingly using the
Web as the front end to database systems
b
STEPS IN DATABASE DESIGN
Requirement analysis
What does the user want?
• Conceptual database design
Defining the entities and attributes, and
the relationships between these --> The
ER model
• Physical database design
Implementation of the conceptual design
using a Database Management System
b
Normalization
b
Trends – Distributed Databases
• Distributed database
– Also called a virtualized database
– Actual data may be spread across several databases at different
locations, allow more users direct access at different user sites
• Master database file: database that records the existence of all other
databases and the location of those database files and records the
initialization information for database
• Transaction database file: comprises a unit of work performed
within a DBMS against a database, and treated in a coherent and
reliable way independent of other transactions
• Replicated database
– Database that holds a duplicate set of frequently used data
b
Centralized Databases
• Used by single central processor or multiple processors
in client/server network
b
Database Administration
• Database administrator
– A skilled and trained computer professional who directs all
activities related to an organization’s database, including providing
security from intruders
• responsible for
– Overall design and coordination of the database
– Development and maintenance of schemas
– Development and maintenance of the data dictionary
– Implementation of the DBMS
– System and user documentation
– User support and training
– Overall operation of the DBMS
– Testing and maintaining of the DBMS
– Establishing emergency and recovery procedures
b
Database Recoverability
is usually defined as a way to store data as a back
up and then test the back ups to make sure that
they are valid.
the task of integrity means that data that is pulled
for certain records or files are in fact valid and
have high data integrity
data integrity is extremely important especially
when creating reports or when data is used for
analysis. If you have data that is deemed invalid,
your results will be worthless.
b
Database Security
Is an essential task for database administrators. For
instance, database administrators are usually in charge of
giving clearance and access to certain databases in an
organization.
Another important task is availability. Availability is
defined as making sure a database is up and running. The
more up time, usually the higher level of productivity.
Performance is related to availability, it is considered
getting the most out of the hardware, applications and data
as possible. Performance is usually in relation to an
organizations budget, physical equipment and resources.
b
Ensuring Data Quality
• The quality of decision making in a firm is directly
related to the quality of data in its databases.
• Data Quality Audit: Structured survey of the accuracy
and level of completeness of the data in an
information system
• Data Cleansing: Consists of activities for detecting
and correcting data in a database or file that are
incorrect, incomplete, improperly formatted, or
redundant