Lecture Notes 1 - AD - Database Concepts - An Overview
Lecture Notes 1 - AD - Database Concepts - An Overview
ADVANCED DATABASES
Database Concepts
An Overview
What is a Database?
Supermarket
A supermarket stores different types of information about its
products, such as quantity, prices and type of product.
Every time we buy anything from the supermarket, quantities
must be reduced and the sales information must be stored.
Company
A company will need to hold details of its staff, customers,
products, suppliers and financial transactions.
Flat-file Systems and Database Systems
Structure:
Flat-file systems store data in a simple, flat, and
unstructured file format. Data is often organized in a
tabular or text format with rows and columns.
Scalability:
As data grows, flat-file systems may become difficult to
manage and scale. Adding new data or modifying the
structure can be challenging.
Examples:
Text files, CSV files (Comma Separated Value(s)), and Excel
spreadsheets are examples of flat-file systems.
Database Terminology
and
Features
1. Data Vs Metadata
Data:
Data refers to the actual information stored in a
database.
Metadata:
Metadata refers to data that describes other data. It
provides additional context, structure, and information
about the data stored in a database.
That is, they allow many users to access the same database
at the same time.
However, they are not exactly the same. Here are the specific
differences between the two phrases:
Data Redundancy:
Data redundancy refers to the storage of the same piece of
data in multiple places within a database, which can occur
intentionally or unintentionally.
Duplication of Data:
Duplication of data specifically refers to the presence of
multiple identical copies of the same data item within a
database, implying that the same data item is stored more
than once for no apparent reason.
while
2. Attributes:
Attributes are properties or characteristics of entities. They
describe the data that can be associated with an entity.
3. Relationships:
Relationships define associations and connections between
entities. They describe how entities are related to each other
and can include cardinality (e.g., one-to-one, one-to-many) and
other constraints.
4. Constraints:
Constraints specify rules and conditions that must be
satisfied by the data to maintain consistency and integrity.
5. Keys:
Keys uniquely identify instances of an entity and establish
relationships between entities.
7. Model Notations:
Data models can be represented using various notations.
Common notations include Entity-Relationship Diagrams
(ERD) for relational models, UML (Unified Modeling
Language) diagrams for object-oriented models, and
JSON or XML representations for document-oriented
models.
8. Types of Data Models:
Data models can be categorized into different types based
on their representation and structure. Some common types
include hierarchical, network, relational, object-oriented,
document, and more.
9. Abstraction Levels:
Data models can exist at different abstraction levels:
conceptual, logical, and physical.
Overview:
In a centralized database system, all data is stored in a
single, central location, and a single database management
system (DBMS) manages and controls access to the data.
Characteristics:
Data and processing are concentrated in one location.
Users and applications access the central database for all
their data needs.
Simple to manage but can be a single point of failure.
Example:
Small businesses or applications with a limited user base
may use centralized database systems where all data is
stored on a single server.
With a centralized database system, the DBMS and database
are stored at a single site that is used by several other systems
too. This is illustrated in the figure below:
2. Distributed Database System:
Overview:
A distributed database system involves the distribution
of data across multiple nodes or locations, and each
node has its own DBMS.
The nodes are connected through a network.
Characteristics:
Data is distributed to improve performance, scalability,
and fault tolerance.
Different nodes may have local autonomy in managing
their data.
Requires mechanisms for data distribution,
communication, and coordination.
Example:
Large enterprises with geographically dispersed offices
or a cloud-based database system where data is
distributed across multiple servers.
3. Homogeneous Distributed Database System:
Overview:
In a homogeneous distributed database system, all
nodes have the same DBMS software, and the data
model is consistent across all nodes.
Data exchange between these various sites can be
handled easily.
Characteristics:
Standardized software and data models across all
distributed nodes.
Uniformity simplifies communication and coordination
between nodes.
Easier to manage compared to heterogeneous
distributed systems.
Example:
An organization using a single type of database
management system across all its distributed locations.
Overview:
A heterogeneous distributed database system involves
different types of DBMS software or data models across
the distributed nodes.
Example:
An organization using different types of databases (e.g.,
relational, NoSQL) across its distributed locations
based on specific requirements.
Considerations:
What is metadata?