0% found this document useful (0 votes)
19 views21 pages

DBMS

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views21 pages

DBMS

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

DBMS Architecture 1-level, 2-Level, 3-Level

1-Tier Architecture

○ In this architecture, the database is directly available to the user. It means the
user can directly sit on the DBMS and use it.

○ Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.

○ The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for quick response.

2-Tier Architecture
○ The 2-Tier architecture is same as the basic client-server. In the two-tier
architecture, applications on the client end can directly communicate with the
database at the server side. For this interaction, API's like: ODBC, JDBC are used.

○ The user interfaces and application programs are run on the client-side.

○ The server side is responsible to provide the functionalities like: query processing
and transaction management.

○ To communicate with the DBMS, client-side applications establishes a


connection with the server side.

3-Tier Architecture

○ The 3-Tier architecture contains another layer between the client and server. In
this architecture, client can't directly communicate with the server.
○ The application on the client-end interacts with an application server which
further communicates with the database system.

○ End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the
application.

○ The 3-Tier architecture is used in case of large web application.

Some Points to remember

1 Three-tier Architecture:
● Presentation Tier (Front-end): The user interface or application layer where users
interact with the database.
● Application (Middle) Tier: Manages application logic and processing, including
business rules and data validation.
● Data Tier (Back-end): Stores and manages the data. It includes the database
server and storage.
Components of DBMS:
● Data Definition Language (DDL): Used to define the structure of the database,
such as creating, altering, and deleting tables and relationships.
● Data Manipulation Language (DML): Involves operations like inserting, updating,
and querying data.
● Database Engine: Responsible for processing SQL queries, managing
transactions, and handling data integrity.
Database Schema:
● Physical Schema: Describes how data is stored in the database, including details
like table structures, indexes, and storage mechanisms.
● Logical Schema: Represents the logical organization of data without concerning
itself with how the data will be stored physically.
Concurrency Control:
● Transaction Management: Involves ensuring the consistency and isolation of
transactions, typically through techniques like locking and timestamps to
manage concurrent access to data.
Data Independence:
● Logical Data Independence: Allows modification of the logical structure of the
database without affecting the application programs using the data.
● Physical Data Independence: Allows modification of the physical storage
structures without affecting the application programs.
Query Optimization:
● The DBMS optimizes queries to execute them in the most efficient manner,
considering factors like indexing, caching, and query execution plans.
Security and Authorization:
● DBMS provides mechanisms for authentication and authorization to ensure that
only authorized users can access the database and perform specific operations.
Backup and Recovery:
● DBMS includes features for regular data backups and recovery mechanisms to
restore the database to a consistent state in case of failures.
Data models

Data models provide a structured way to represent and organize data, defining how data
elements relate to each other and how they can be manipulated. They serve as
blueprints for designing databases and help ensure the efficient storage, retrieval, and
manipulation of information within a system. Data models can be conceptual, logical, or
physical, offering varying levels of abstraction and detail in describing the structure and
relationships of data in a given context.

Relational Data Model

The Relational Data Model is a foundational concept in database management that


organizes data into tables with rows and columns, establishing relationships between
them. Here are some key notes on the Relational Data Model:

1. Tables and Rows:


a. Data is organized into tables, also known as relations.
b. Each table consists of rows (tuples) and columns (attributes/fields).
c. Rows represent individual records, while columns represent attributes of
those records.
2. Primary Key:
a. Each table has a primary key, a unique identifier for each record.
b. Ensures the uniqueness of records and facilitates the establishment of
relationships between tables.
3. Foreign Key:
a. A foreign key is a column in one table that refers to the primary key in
another table.
b. Establishes relationships between tables, enforcing referential integrity.
4. Normalization:
a. The process of organizing data to reduce redundancy and dependency by
breaking down large tables into smaller, related tables.
b. Normalization ensures data integrity and minimizes update anomalies.
5. ACID Properties:
a. Relational databases adhere to ACID properties (Atomicity, Consistency,
Isolation, Durability) to ensure reliable and secure transaction processing.
6. Structured Query Language (SQL):
a. SQL is the standard language for interacting with relational databases.
b. Allows users to define, manipulate, and query data in a relational
database.

ER (Entity Relationship) Diagram in DBMS

○ ER model stands for an Entity-Relationship model. It is a high-level data model.


This model is used to define the data elements and relationship for a specified
system.

○ It develops a conceptual design for the database. It also develops a very simple
and easy to design view of data.

○ In ER modeling, the database structure is portrayed as a diagram called an


entity-relationship diagram.

For example, Suppose we design a school database. In this database, the student will
be an entity with attributes like address, name, id, age, etc. The address can be another
entity with attributes like city, street name, pin code, etc and there will be a relationship
between them.
1. Entity:

An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.

Consider an organization as an example- manager, product, employee, department etc.


can be taken as an entity.
a. Weak Entity

An entity that depends on another entity called a weak entity. The weak entity doesn't
contain any key attribute of its own. The weak entity is represented by a double
rectangle.

2. Attribute

The attribute is used to describe the property of an entity. Eclipse is used to represent
an attribute.

For example, id, age, contact number, name, etc. can be attributes of a student.

a. Key Attribute

The key attribute is used to represent the main characteristics of an entity. It represents
a primary key. The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute

An attribute that composed of many other attributes is known as a composite attribute.


The composite attribute is represented by an ellipse, and those ellipses are connected
with an ellipse.

c. Multivalued Attribute

An attribute can have more than one value. These attributes are known as a multivalued
attribute. The double oval is used to represent multivalued attribute.

For example, a student can have more than one phone number.
d. Derived Attribute

An attribute that can be derived from other attribute is known as a derived attribute. It
can be represented by a dashed ellipse.

For example, A person's age changes over time and can be derived from another
attribute like Date of birth.

3. Relationship

A relationship is used to describe the relation between entities. Diamond or rhombus is


used to represent the relationship.
Types of relationship are as follows:

a. One-to-One Relationship

When only one instance of an entity is associated with the relationship, then it is known
as one to one relationship.

For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship

When only one instance of the entity on the left, and more than one instance of an entity
on the right associates with the relationship then this is known as a one-to-many
relationship.

For example, Scientist can invent many inventions, but the invention is done by the only
specific scientist.

c. Many-to-one relationship

When more than one instance of the entity on the left, and only one instance of an entity
on the right associates with the relationship then it is known as a many-to-one
relationship.
For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship

When more than one instance of the entity on the left, and more than one instance of an
entity on the right associates with the relationship then it is known as a many-to-many
relationship.

For example, Employee can assign by many projects and project can have many
employees.

Object-based Data Model

An object-based data model is a type of data model that represents data using the
concepts of objects, classes, and inheritance. It is closely associated with
object-oriented programming and is designed to store and manipulate data in a way that
mirrors the real-world entities and their relationships. Here are some key points about
object-based data models:

★ Objects:
○ An object is an instance of a class, representing a real-world entity or
concept.
○ Objects encapsulate both data (attributes) and behaviors (methods or
functions).
★ Classes:
○ A class is a blueprint or template for creating objects.
○ It defines the properties (attributes) and behaviors (methods) that objects
of the class will have.
★ Attributes:
○ Attributes are properties or characteristics associated with objects.
○ They represent the data that objects of a class can hold.
★ Methods:
○ Methods are functions or procedures associated with a class.
○ They define the behavior or actions that objects of the class can perform.
★ Encapsulation:
○ Encapsulation is the bundling of data and methods within a class,
restricting access to the internal details of the object.
○ It promotes information hiding and helps maintain data integrity.
★ Inheritance:
○ Inheritance allows a class to inherit attributes and behaviors from another
class.
○ It promotes code reuse and the creation of a hierarchy of classes.
★ Polymorphism:
○ Polymorphism enables objects of different classes to be treated as
objects of a common base class.
○ It allows for flexibility in handling various object types through a common
interface.
★ Object Identity:
○ Each object has a unique identity, allowing it to be distinguished from
other objects.
○ Object identity is often represented by a unique identifier.
★ Complex Relationships:
○ Object-based data models can represent complex relationships between
objects, mirroring real-world scenarios more accurately.
★ Persistence:
○ Object-based data models often support the persistence of objects,
allowing them to be stored in databases and retrieved later.
★ Object Query Language (OQL):
○ OQL is a query language designed for object-oriented databases, allowing
users to query and manipulate objects using a syntax similar to SQL.
The semi-structured data model

The semi-structured data model is a type of data model that does not require a rigid
schema, offering flexibility in representing data with varying structures. Unlike the
structured data model of relational databases, semi-structured data does not adhere to
a strict, predefined schema. Instead, it allows for irregularities and variations in the data
structure. Here are key characteristics and points about the semi-structured data model:

1. Flexible Schema:
a. Semi-structured data does not conform to a fixed schema, allowing for
variations in the structure of the data.
2. Self-Describing:
a. Data is often self-describing, meaning that it may include metadata or tags
that provide information about the structure and meaning of the data
elements.
3. Common Representations:
a. Semi-structured data is commonly represented in formats like XML
(eXtensible Markup Language) and JSON (JavaScript Object Notation),
which can represent nested and hierarchical structures.
4. Hierarchy and Nesting:
a. Semi-structured data often exhibits hierarchical relationships and can be
organized in nested structures, such as trees or graphs.
5. Example Formats:
a. XML (eXtensible Markup Language): Uses tags to define elements and
their relationships in a hierarchical structure.
b. JSON (JavaScript Object Notation): Represents data as key-value pairs
and supports nested structures.
6. No Formal Data Definition Language (DDL):
a. Unlike relational databases with a formal Data Definition Language (DDL),
semi-structured data does not require a predefined schema.
Normalization
Normalization is a database design technique used to organize data in a relational
database efficiently. The goal of normalization is to eliminate data redundancy, reduce
the likelihood of data anomalies, and ensure data integrity. This process involves
breaking down a large table into smaller, related tables while maintaining the
relationships between them. The result is a set of well-structured tables that adhere to
specific normal forms. The most common normal forms are the First Normal Form
(1NF), Second Normal Form (2NF), and Third Normal Form (3NF). Here is an overview of
these normal forms:

1. First Normal Form (1NF):


a. Ensures that the values in each column of a table are atomic and do not
contain repeating groups or arrays.
b. Each column should contain only one piece of information, and there
should be a primary key to uniquely identify each record.
2. Second Normal Form (2NF):
a. Builds on 1NF and addresses partial dependencies.
b. A table is in 2NF if it is in 1NF and if all non-key attributes are fully
functionally dependent on the primary key.
c. In other words, every non-prime attribute must be fully dependent on the
entire primary key, not just part of it.
3. Third Normal Form (3NF):
a. Builds on 2NF and addresses transitive dependencies.
b. A table is in 3NF if it is in 2NF and if all non-prime attributes are
non-transitively dependent on the primary key.
c. Transitive dependency means that a non-prime attribute depends on
another non-prime attribute, which, in turn, depends on the primary key.

Normalization is a crucial process for maintaining data consistency and reducing


redundancy in a database. While these are the three most common normal forms,
higher normal forms (such as Boyce-Codd Normal Form, Fourth Normal Form, and Fifth
Normal Form) exist to address more complex scenarios. The choice of the normal form
to achieve depends on the specific requirements and characteristics of the data being
modeled.
Data independence
is a concept in database management that refers to the separation between the
physical implementation of the database and the way data is logically represented and
accessed by applications. The goal of data independence is to allow changes to be
made in one aspect of the database system without affecting the other, providing
flexibility, adaptability, and ease of maintenance.

Indexing and hashing


Indexing and hashing are two techniques used in Database Management Systems
(DBMS) to optimize data retrieval operations, especially in scenarios with large
datasets. Both techniques aim to provide quick and efficient access to data, but they
have different approaches.

Indexing:
1. Definition:
a. Indexing is a data structure technique that involves creating an additional
data structure, known as an index, to improve the speed of data retrieval
operations.
b. The index contains a subset of the columns from the actual table and a
pointer to the corresponding rows.
2. Types of Indexes:
a. Clustered Index: Determines the physical order of data rows in the table
based on the indexed column. There can be only one clustered index per
table.
b. Non-Clustered Index: Creates a separate structure containing the indexed
column values and pointers to the actual rows.
3. Advantages:
a. Faster data retrieval: Indexes provide a shortcut to the relevant rows,
reducing the number of data pages that need to be accessed.
b. Improved query performance: Queries involving the indexed columns
benefit from quicker access to the required data.
c. Supports range queries: Indexes are beneficial for range-based searches.
4. Disadvantages:
a. Increased storage space: Indexes consume additional storage space, as
they are separate structures.
b. Overhead during updates: Maintaining indexes during insert, update, or
delete operations can impact performance.

Hashing:
1. Definition:
a. Hashing is a technique that uses a hash function to map data to a
fixed-size array, known as a hash table.
b. The hash function takes an input (e.g., a key) and generates a hash code,
which is used as an index to access the corresponding location in the
hash table.
2. Hash Collisions:
a. Hash collisions occur when two different inputs produce the same hash
code. There are various methods to handle collisions, such as chaining
(linked lists at each hash table slot) or open addressing (probing for the
next available slot).
3. Advantages:
a. Constant-time access: Hashing can provide constant-time access to data
when there are no collisions.
b. Well-suited for equality searches: Hashing is efficient for equality-based
queries.
4. Disadvantages:
a. Poor support for range queries: Hashing is not well-suited for range-based
searches.
b. Collision resolution: Handling collisions adds complexity to the
implementation.

When to Use Each:


● Indexing:
● Well-suited for scenarios where range queries and sorted order are
important.
● Effective when there is a need for a specific ordering of rows.
● Often used with relational databases supporting complex queries.
● Hashing:
● Effective for equality-based searches, such as lookup operations.
● Suitable for scenarios where constant-time access is crucial.
● Commonly used in scenarios like in-memory databases and hash-based
file organizations.

The choice between indexing and hashing depends on the specific requirements of the
application, the nature of queries, and the expected workload on the database. In some
cases, a combination of both techniques may be employed to address different types of
queries efficiently.

Query processing definition & techniques

Query Processing Techniques:


1. Indexing:
a. Effective use of indexes to accelerate data retrieval for columns specified
in the WHERE clause.
b. The query optimizer may choose different access paths based on the
availability and selectivity of indexes.
2. Join Algorithms:
a. Various join algorithms, such as nested loop joins, hash joins, and merge
joins, are employed based on the size of tables and available indexes.
b. The optimizer selects the most suitable join strategy for the given query.
3. Sorting and Aggregation:
a. Efficient sorting and aggregation algorithms are crucial for processing
GROUP BY and ORDER BY clauses.
b. Techniques like hash-based or external sorting may be used.
4. Parallel Processing:
a. Distributing query execution across multiple processors or nodes to
parallelize data retrieval and processing.
b. Commonly used in large-scale database systems to improve performance.
5. Caching and Buffering:
a. Caching frequently accessed data or intermediate results in memory to
reduce the need for repeated disk I/O operations.
b. Buffering techniques help manage the movement of data between disk
and memory.
6. Query Rewrite:
a. Transforming a query into an equivalent but more efficient form.
b. Techniques may include using materialized views or restructuring complex
queries to simplify execution plans.
7. Cost-Based Optimization:
a. Evaluating the cost of different execution plans and selecting the plan with
the lowest estimated cost.
b. The cost model considers factors such as I/O costs, CPU costs, and
network costs.
8. Query Hints:
a. Providing hints or directives to the optimizer to influence the choice of
execution plan.
b. Useful when domain knowledge or specific requirements dictate a
particular approach.
9. In-Memory Processing:
a. Storing and processing data in memory rather than relying solely on
disk-based storage.
b. In-memory databases can significantly improve query processing speed.
10. Materialized Views:
a. Precomputing and storing the results of certain queries in advance.
b. Materialized views can be used to speed up complex queries by avoiding
redundant computations.

Questions
1` Collections of raw facts and figure is DATA but, Manipulated n
processed data is Information

You might also like