0% found this document useful (0 votes)
5 views

Lecture Notes 1 - AD - Database Concepts - An Overview

The document provides an overview of databases, defining them as structured collections of related data managed by a Database Management System (DBMS). It contrasts flat-file systems with database systems, highlighting their differences in organization, redundancy, scalability, and data integrity. Additionally, it discusses key database concepts such as data vs. metadata, data sharing, integrity constraints, and data models, emphasizing the importance of structured data management in various applications.

Uploaded by

mushabati6jr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Lecture Notes 1 - AD - Database Concepts - An Overview

The document provides an overview of databases, defining them as structured collections of related data managed by a Database Management System (DBMS). It contrasts flat-file systems with database systems, highlighting their differences in organization, redundancy, scalability, and data integrity. Additionally, it discusses key database concepts such as data vs. metadata, data sharing, integrity constraints, and data models, emphasizing the importance of structured data management in various applications.

Uploaded by

mushabati6jr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 67

CIT3641 - LECTURE NOTES 1

ADVANCED DATABASES
Database Concepts

An Overview
What is a Database?

 A "database" is a structured and organized collection of


related data that is stored electronically in a computer
system.

 It is designed to efficiently manage, store, and retrieve


information in a way that supports data integrity,
security, and ease of access.

 A database typically consists of tables, each containing


rows and columns, with relationships defined between
the tables.
 The data in a database can be queried, updated, and
manipulated using a database management system
(DBMS).

 Databases are widely used in various applications and


industries to store and manage large volumes of
structured data.

 A database may also be defined as a “Repository of


Data”.

 In the sentence above, the term "repository" refers to a


centralized location or container where data is stored,
organized, and managed.
 A repository, in the context of databases, serves as a
structured and controlled storage space for storing
various types of data.

 It implies a place where data is stored in a systematic


manner, often following a predefined structure and
format.

Key characteristics of a database being described as a


repository include:

 Centralization: The term "repository" suggests that the


data is stored in a central location, making it easily
accessible and manageable.
 Organization: Data in a repository is typically
organized in a structured manner, often using tables,
rows, and columns, to facilitate efficient retrieval and
management.

 Control: A repository implies a level of control over the


data, including mechanisms for data integrity, security,
and access control. This ensures that data is stored
and managed in a reliable and secure manner.

 Storage: The repository serves as a storage space for


different types of data, providing a means to store and
retrieve information as needed.
 Management: A repository involves not just storage but
also management functionalities, such as querying,
updating, and manipulating data. This is typically
facilitated by a Database Management System
(DBMS).

In summary, when describing a database as a repository of


data, it emphasizes that the database is a centralized,
organized, and controlled storage space for managing and
storing various types of data in a structured manner.
Database Management System

 A database management system (DBMS) is a collection of


programs that enable users to create and maintain
databases and control all access to them.

 The primary goal of a DBMS is to provide an environment


that is both convenient and efficient for users to retrieve and
store information.

 With the database approach, we can have the traditional


banking system as shown in the figure below.

 In this bank example, a DBMS is used by the Personnel


Department, the Account Department and the Loan
Department to access the shared corporate database.
Because of the versatility of databases, we find them
powering all sorts of projects. A database can be linked to:

 A website that is capturing registered users

 A client-tracking application for social service


organizations

 A medical record system for a health care facility

 Your personal address book in your email client

 A collection of word-processed documents

 A system that issues airline reservations


 Day-to-day business processes executed by individuals and
organisations require both present and historical data.

 Therefore, data storage is essential for organisations and


individuals.

 Data supports business functions and aids in business


decision-making.

 Below are some of the examples where data storage


supports business functions.
Social media
 We access social media using our computers and mobile
phones.
 Every time we access social media, we interact, collaborate
and share content with other people.
 The owners of social media platforms store the data we
produce.

Supermarket
 A supermarket stores different types of information about its
products, such as quantity, prices and type of product.
 Every time we buy anything from the supermarket, quantities
must be reduced and the sales information must be stored.

Company
 A company will need to hold details of its staff, customers,
products, suppliers and financial transactions.
Flat-file Systems and Database Systems

Flat-file systems and database systems are two different


approaches to storing and managing data. Here are the
key differences between them:

Structure:
 Flat-file systems store data in a simple, flat, and
unstructured file format. Data is often organized in a
tabular or text format with rows and columns.

 Database systems use a structured approach, typically


organizing data into tables with relationships between
them. This provides a more organized and efficient way
to store and manage data.
Data Redundancy:
 Redundancy is common in flat-file systems because data is
typically duplicated in multiple files. Each file may contain a
subset of related information.

 Database systems aim to minimize data redundancy through


normalization techniques. Normalization involves organizing
data to reduce redundancy and dependency.

Scalability:
 As data grows, flat-file systems may become difficult to
manage and scale. Adding new data or modifying the
structure can be challenging.

 Database systems are designed to handle large amounts of


data and are more scalable. They provide mechanisms to
efficiently add, modify, and retrieve data.
Data Integrity:
 Maintaining data integrity can be challenging in flat-file
systems. Inconsistencies and errors may occur due to
redundancy and lack of data normalization.

 Database systems enforce data integrity through constraints,


relationships, and normalization. This helps ensure that data
is accurate and consistent.

Data Integrity refers to the accuracy, consistency, and reliability


of data stored in a database. It ensures that the data remains
valid, and accurately represents the real-world information it is
supposed to model, during its entire lifecycle, from input to
storage and retrieval.
Access and Retrieval:
 Retrieving specific information from flat-file systems can be
time-consuming, especially when dealing with large datasets.
Searching and querying capabilities are limited.

 Database systems provide robust querying and retrieval


capabilities. Structured Query Language (SQL) is commonly
used to interact with relational database systems.

Examples:
Text files, CSV files (Comma Separated Value(s)), and Excel
spreadsheets are examples of flat-file systems.

Relational database management systems (RDBMS) such as


MySQL, PostgreSQL, Oracle, and Microsoft SQL Server are
examples of database systems.
Summary:

In summary, the key differences lie in the organization,


redundancy, scalability, data integrity, and retrieval
capabilities.

Flat-file systems are simpler but may lack efficiency and


organization as data scales.

Database systems provide a structured, organized, and


scalable solution with features to ensure data integrity and
efficient access and retrieval of information.

As a result, database systems are widely used in various


applications, especially when dealing with large and
complex datasets.
Basic

Database Terminology

and

Features
1. Data Vs Metadata

In the context of databases, the two terms are different:

Data:
 Data refers to the actual information stored in a
database.

 It represents the records, values, and content that are


stored within the tables or collections of the database.

 Data can be structured, semi-structured, or


unstructured, depending on the type of database and
the nature of the information being stored.
 Examples of data include customer names, addresses,
product prices, sales transactions, employee salaries,
and so on.

 Data is what users typically interact with when querying,


inserting, updating, or deleting information in a
database.

Metadata:
 Metadata refers to data that describes other data. It
provides additional context, structure, and information
about the data stored in a database.

 Metadata describes the characteristics, properties, and


attributes of the data, including its structure, format,
relationships, and usage.
 Examples of metadata in a database include table
names, column names, data types, constraints,
indexes, relationships between tables, and data
definitions.

While data represents the actual information stored in a


database, metadata provides additional information about
the data, such as its structure, relationships, and
properties.

Data is the content of the database, while metadata is the


information about that content.

Both data and metadata are critical components of


database management and play important roles in data
organization, interpretation, and utilization.
2. Data Dictionary Vs Metadata

Metadata is a superset of a data dictionary.

 Metadata refers to "data about data." It provides descriptive,


structural, and administrative information about data, including
details like data type, format, source, creation date,
relationships, and usage policies.

 A Data Dictionary is a subset of metadata that specifically


documents the structure and definitions of data elements within
a database, including table names, column names, data types,
constraints, and relationships.

 Data dictionary is a part of metadata that focuses on defining


the structure of data within a system. Since metadata includes
additional details beyond just structure, it serves as a superset
of the data dictionary.
3. Sharing of data and multi-user system

 Current database systems are designed for multiple users.

 That is, they allow many users to access the same database
at the same time.

 This access is achieved through features called concurrency


control strategies.

 These strategies ensure that the data accessed are always


correct and that data integrity is maintained.

 The design of modern multiuser database systems is a great


improvement from those in the past which restricted usage
to one person at a time.
4. Control of data redundancy and data consistency

 In the database approach, ideally, each data item is


stored in only one place in the database.

 In some cases, data redundancy still exists to improve


system performance,
but
such redundancy is controlled by application
programming.

 Redundancy is kept to minimum by introducing as little


redundancy as possible when designing the database.

 With flat files, each program usually stores its own


separate files.
 If the same data is to be accessed by different
programs, then each program must store its own copy
of the same data.

 If the data is kept in different files, there could be


problems when an item of data needs updating, as
it will need to be updated in all the relevant files.

 If this is not done, the data will be inconsistent, and


this could lead to errors.

 In database systems, concurrency control features


ensure that data remains consistent and valid
during transaction processing even if several users
update the same information.
Data Redundancy Vs Duplication of Data

Data redundancy and duplication of data are related concepts


and both involve the storage of redundant copies of data within
a database.

However, they are not exactly the same. Here are the specific
differences between the two phrases:

Data Redundancy:
 Data redundancy refers to the storage of the same piece of
data in multiple places within a database, which can occur
intentionally or unintentionally.

 Redundancy can exist at the field level (repeated values


within a single field across different records or tables) or at
the record level (identical records or tuples within a table or
across multiple tables).
 Redundancy can sometimes be intentional and used for
optimization purposes, such as denormalization for
performance improvements.

Duplication of Data:
 Duplication of data specifically refers to the presence of
multiple identical copies of the same data item within a
database, implying that the same data item is stored more
than once for no apparent reason.

 Duplication of data can occur due to errors in data entry, lack


of normalization in database design, or inefficient data
storage practices.

 Unlike redundancy, duplication of data is generally


considered undesirable and can lead to inconsistencies,
difficulty in data maintenance, and increased storage
requirements without providing any benefits.
While both terms involve the storage of identical copies of data,

data redundancy can sometimes be intentional and used for


optimization purposes,

while

duplication of data is generally considered undesirable and


represents unnecessary repetition of data within a database.
5. Data sharing

 The integration of all the data, for an organization,


within a database system has many advantages.

 First, it allows for data sharing among employees and


others who have access to the system.

 Second, it gives users the ability to generate more


information from a given amount of data than would be
possible without the integration.

 Generating information from different datasets leads to


unnecessary reconciliations of reports.
6. Enforcement of integrity constraints

 Database management systems must provide the ability to


define and enforce certain constraints to ensure that users
enter valid information and maintain data integrity.

 A database constraint is a restriction or rule that dictates


what can be entered or edited in a table such as a date
using a certain format or adding a valid city in the City field.

 There are many types of database constraints. Data type,


for example, determines the sort of data permitted in a field,
for example numbers only.

 Data uniqueness such as the primary key ensures that no


duplicates are entered. Constraints can be simple (field
based) or complex (programming).
7. Restriction of unauthorized access

 Not all users of a database system will have the same


accessing privileges.

 For example, one user might have read-only


access (i.e., the ability to read a file but not make
changes), while another might have read and write
privileges, which is the ability to both read and modify a
file.

 For this reason, a database management system


should provide a security subsystem to create and
control different types of user accounts and restrict
unauthorized access.
8. Data independence

 Another advantage of a database management system


is how it allows for data independence.

 In other words, the system data descriptions or data


describing data (metadata) are separated from the
application programs.

 This is possible because changes to the data structure


are handled by the database management system and
are not embedded in the program itself.

 This therefore means that the underlying structure of a


data file can be changed without the application
programs needing amendment.
9. Backup and recovery facilities

 Backup and recovery are methods that allow you to


protect your data from loss.

 The database system provides a separate process,


from that of a network backup, for backing up and
recovering data.

 If a hard drive fails and the database stored on the hard


drive is not accessible, the only way to recover the
database is from a backup.

 Through the use of transaction logs, database systems


have the capacity to recover their data up to the point of
failure.
 If a computer system fails in the middle of a complex
update process, the recovery subsystem is responsible
for making sure that the database is restored to its
original state.

 Backup and recovery features are two more benefits of


a database management system.
Data Models

A data model is a conceptual representation and


abstraction of the structure and relationships within a
database or information system.

It defines how data is organized, stored, and accessed,


providing a blueprint for the logical and physical structure
of the data.

The purpose of a data model is to facilitate communication


between stakeholders, guide the design and
implementation of a database, and ensure consistency and
integrity of the stored information.

Key elements in defining a data model include:


1. Entities:
Entities represent the real-world objects or concepts about
which data is being stored. These can be physical objects,
events, or abstract concepts.

2. Attributes:
Attributes are properties or characteristics of entities. They
describe the data that can be associated with an entity.

For example, a "Customer" entity may have attributes such as


"CustomerID," "Name," and "Address."

3. Relationships:
Relationships define associations and connections between
entities. They describe how entities are related to each other
and can include cardinality (e.g., one-to-one, one-to-many) and
other constraints.
4. Constraints:
Constraints specify rules and conditions that must be
satisfied by the data to maintain consistency and integrity.

This can include primary key constraints, foreign key


constraints, uniqueness constraints, and more.

5. Keys:
Keys uniquely identify instances of an entity and establish
relationships between entities.

Common types of keys include primary keys, which


uniquely identify records in a table, and foreign keys, which
establish relationships between tables.
6. Normalization:
Normalization is the process of organizing data to reduce
redundancy and dependency. It involves breaking down
large tables into smaller, related tables to improve data
integrity and minimize data duplication.

7. Model Notations:
Data models can be represented using various notations.
Common notations include Entity-Relationship Diagrams
(ERD) for relational models, UML (Unified Modeling
Language) diagrams for object-oriented models, and
JSON or XML representations for document-oriented
models.
8. Types of Data Models:
Data models can be categorized into different types based
on their representation and structure. Some common types
include hierarchical, network, relational, object-oriented,
document, and more.

9. Abstraction Levels:
Data models can exist at different abstraction levels:
conceptual, logical, and physical.

 Conceptual models focus on high-level concepts and


relationships,
 Logical models refine the details for implementation,
and
 Physical models describe how data is stored on the
hardware.
In summary, a data model is a conceptual framework that
defines how data is organized and structured within a
database or information system.

It serves as a guideline for database design, allowing


stakeholders to understand and communicate the structure
of the data and ensuring that the database meets the
requirements of the intended application.
Classification of Database Systems

Database systems can be classified based on their


data models, which define the structure of the data
and the relationships between data elements.

The following slides provides an overview of some


selected data models:
1. Hierarchical Data Model:

 Description: Represents data in a tree-like structure with a


hierarchy of parent and child relationships.

 Each child can have only one parent, creating a parent-child


relationship.

 Example: IMS (Information Management System) is an


example of a database system that uses a hierarchical data
model. A system developed by IBM.

 IMS is a name that refers to both a database management


system (DBMS) and the hierarchical database model
associated with it.

 In IMS, records are organized in a tree structure, and each


record type represents a level in the hierarchy.
2. Network Data Model:

 Description: Extends the hierarchical model by


allowing each child to have multiple parents, creating a
more flexible structure.

 Nodes (records) can be connected through various


relationships.

 Example: CODASYL (Conference on Data Systems


Languages) databases, such as IDMS (Integrated
Database Management System), follow the network
data model.

 Records are connected through sets, and each record


can participate in multiple sets.
3. Relational Data Model:

 Description: Represents data in tables (relations) with


rows and columns.

 Tables are related based on common attributes, and


relational algebra is used for data manipulation.

 Example: Examples of relational database


management systems (RDBMS) include MySQL,
PostgreSQL, Oracle Database, and Microsoft SQL
Server.

 In a relational database, tables store related data, and


relationships are established through primary and
foreign keys.
4. Object-Oriented Data Model:

 Description: Represents data using objects, which


encapsulate data and behavior.

 Objects can have attributes (data) and methods


(functions), and relationships are established through
object references.

 Example: Object-oriented databases (OODBMS) like


db4o and ObjectDB follow this model.

 In these databases, data is stored in objects with


attributes and methods, and relationships are
established through references between objects.
5. Entity-Relationship Data Model:

 Description: Represents data using entities (objects)


and their relationships.

 Entities have attributes, and relationships describe


associations between entities.

 Example: The entity-relationship model is widely used


in conceptual database design. It helps in visualizing
and defining the structure of a database.

 For instance, when designing a relational database, an


entity-relationship diagram (ERD) is often created to
illustrate entities, attributes, and relationships.
6. Object-Relational Data Model:

 Description: Combines features of both the relational


and object-oriented data models.

 It extends the relational model to include complex data


types and methods associated with data.

 Example: Oracle Database and PostgreSQL support


object-relational features.

 They allow the storage of complex data types, such as


arrays and user-defined types, within a relational
database.
7. Document Data Model:

 Description: Represents data as documents, typically


in JSON or BSON format.

 Documents can be nested, and the model is well-suited


for hierarchical and semi-structured data.

 Example: MongoDB is a NoSQL database that follows


the document data model.

 Data is stored as flexible, JSON-like documents,


allowing for easy representation of complex and
dynamic data structures.
These data models provide different ways to organize and
represent data.

The choice of a particular model depends on the


requirements and characteristics of the data and the
application.

Each model has its strengths and weaknesses, and the


selection often involves considering factors such as data
complexity, relationships, and the need for flexibility and
scalability.
Database Classifications based on Distribution Types

Database systems can be classified based on their


distribution types.

This refers to how data is distributed and managed across


different nodes or locations.

Here's a brief overview of centralized, distributed,


homogeneous distributed, and heterogeneous distributed
database systems:
1. Centralized Database System:

Overview:
 In a centralized database system, all data is stored in a
single, central location, and a single database management
system (DBMS) manages and controls access to the data.

Characteristics:
 Data and processing are concentrated in one location.
 Users and applications access the central database for all
their data needs.
 Simple to manage but can be a single point of failure.

Example:
 Small businesses or applications with a limited user base
may use centralized database systems where all data is
stored on a single server.
With a centralized database system, the DBMS and database
are stored at a single site that is used by several other systems
too. This is illustrated in the figure below:
2. Distributed Database System:

Overview:
 A distributed database system involves the distribution
of data across multiple nodes or locations, and each
node has its own DBMS.
 The nodes are connected through a network.

Characteristics:
 Data is distributed to improve performance, scalability,
and fault tolerance.
 Different nodes may have local autonomy in managing
their data.
 Requires mechanisms for data distribution,
communication, and coordination.
Example:
 Large enterprises with geographically dispersed offices
or a cloud-based database system where data is
distributed across multiple servers.
3. Homogeneous Distributed Database System:

Overview:
 In a homogeneous distributed database system, all
nodes have the same DBMS software, and the data
model is consistent across all nodes.
 Data exchange between these various sites can be
handled easily.

Characteristics:
 Standardized software and data models across all
distributed nodes.
 Uniformity simplifies communication and coordination
between nodes.
 Easier to manage compared to heterogeneous
distributed systems.
Example:
 An organization using a single type of database
management system across all its distributed locations.

 For example, library information systems by the same


vendor, such as Geac Computer Corporation, use the
same DBMS software which allows easy data
exchange between the various Geac library sites.

 In the given example, Geac is mentioned as a vendor


that provides a homogeneous distributed database
system, meaning that libraries using Geac's software
would have a consistent database structure across
multiple locations, enabling easy data exchange.
4. Heterogeneous Distributed Database System:

Overview:
 A heterogeneous distributed database system involves
different types of DBMS software or data models across
the distributed nodes.

 However, there is usually an additional common


software or a tool that would facilitate / support data
exchange between these sites.

 For example, the various library database systems use


the same machine-readable cataloguing (MARC)
format to support library record data exchange.
Characteristics:
 Each node may use a different DBMS, making data
integration more complex.

 Requires middleware or translation mechanisms to


facilitate communication between heterogeneous
systems.

 Allows for flexibility in choosing the most suitable DBMS


for each node.

Example:
 An organization using different types of databases (e.g.,
relational, NoSQL) across its distributed locations
based on specific requirements.
Considerations:

 Performance: Distributed systems can improve


performance by distributing data closer to where it is
needed.

 Scalability: Distributed systems can scale horizontally


by adding more nodes as needed.

 Fault Tolerance: Distribution can enhance fault


tolerance as the failure of one node does not
necessarily impact the entire system.

 Complexity: Distributed systems introduce


complexities related to data consistency,
communication, and coordination.
 The choice between centralized and distributed
systems depends on factors such as the organization's
requirements, scalability needs, and the geographic
distribution of users and data.

 The decision to use homogeneous or heterogeneous


distributed systems depends on factors like
standardization, flexibility, and the diversity of data
management needs.
Key Terms

 database: a shared collection of related data used to


support the activities of a particular organization.

 database administrator (DBA): responsible for


authorizing access to the database, monitoring its use,
and managing all the resources to support the use of the
entire database system.

 database management system (DBMS): a collection


of programs that enables users to create and maintain
databases and control all access to them

 datatype: determines the sort of data permitted in a


field, for example, numbers only.
 distributed database system: the actual database and the
DBMS software are distributed from various sites that are
connected by a computer network

 heterogeneous distributed database system:


different sites might use different DBMS software, but there
is additional common software to support data exchange
between these sites

 homogeneous distributed database systems: use the


same DBMS software at multiple sites

 end user: people whose jobs require access to a database


for querying, updating, and generating reports

 metadata: defines and describes the data and


relationships between tables in the database
 multiuser database system: a database management
system that supports multiple users concurrently

 object-oriented data model: a database management


system in which information is represented in the form
of objects as used in object-oriented programming

 single-user database system: a database


management system that supports one user at a time
Home Exercises

 What is a database management system (DBMS)?

 How is a DBMS distinguished from a file-based system?

 What is metadata?

 What are the properties of a DBMS?

 What is the difference between centralized and distributed


database systems?

 What is the difference between homogenous distributed


database systems and heterogeneous distributed
database systems?
 In relation to database systems, discuss the general
concepts of the ACID (Atomicity, Consistency, Isolation,
and Durability) properties.

 Using appropriate examples, you are required to


provide a definition and detailed explanation on how
each of the four (4) ACID components operate.
QUESTIONS

You might also like