0% found this document useful (0 votes)
11 views

DBMS(unit(1,2,3,4,5))

The document provides an overview of Database Management Systems (DBMS), explaining the concepts of databases, data, and the functionalities of DBMS, including data modeling, storage, and retrieval. It details various types of DBMS, relational models, and SQL operations, including joins and clauses for data manipulation. Additionally, it covers advanced functions like subqueries and highlights the advantages and use cases of DBMS in different sectors.

Uploaded by

Shivaraj Bhagoji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

DBMS(unit(1,2,3,4,5))

The document provides an overview of Database Management Systems (DBMS), explaining the concepts of databases, data, and the functionalities of DBMS, including data modeling, storage, and retrieval. It details various types of DBMS, relational models, and SQL operations, including joins and clauses for data manipulation. Additionally, it covers advanced functions like subqueries and highlights the advantages and use cases of DBMS in different sectors.

Uploaded by

Shivaraj Bhagoji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

DBMS

What is a Database?
Database is the collection of organized data that is structured or unstructured. The database's
primary goal is to store a huge amount of data.

What is Data?
Data can be any information, thought, or media that is being transferred from one person to
another. Everything we say or communicate is the form of data. In computers, it can be used in
many forms like raw text, numbers, characters, bytes, images, etc.

What is DBMS (Database Management System)?

A Database Management System (DBMS) is a specialized software application crafted for the
purpose of efficiently organizing, storing, and retrieving data in a structured format. This system
empowers users to seamlessly create, update, and retrieve information from a database, while
also providing robust tools for overseeing security measures and access controls within the
database environment.

Key Features of DBMS

1. Data Modeling
2. Data Storage and Retrieval
3. Concurrency Control
4. Data Integrity and Security
5. Backup and Recovery
6. Classification of DBMS:

● Relational Database Management System (RDBMS)


Organizes data in tables with rows and columns.
Establishes relationships between tables using primary and foreign keys.
● Non-Relational Database Management System (NoSQL)
Organizes data in diverse structures like key-value pairs, documents, graphs, or
column-based formats.
Tailored to handle large-scale, high-performance scenarios efficiently.

Different Types of DBMS

1. Centralized DBMS
2. Decentralized DBMS
3. Relational DBMS
4. NoSQL DBMS
5. Hierarchical DBMS
6. Network DBMS
7. Object-Oriented DBMS

Here are examples of some widely used DBMS:

1. MySQL:
2. PostgreSQL:
3. Microsoft SQL Server:
4. Oracle Database:
5. MongoDB:
6. SQLite:

DBMS Use Cases:

1. Business Operations
2. E-commerce
3. Healthcare Records
4. Educational Institutions

Advantages of DBMS:

● DBMS is secure due to the authentication and user authorization and DBMS is reliable in
storing the data in the database.
● DBMS offers functionality to remove and minimize the data redundancy with the help of
normalization techniques.
● DBMS provides different data views for different users.
● There is a facility to take backup and recover the lost data in DBMS.
● DBMS can be integrated with Python, Java, or any other programming language to make
use of the database.
● ACID properties ensure healthy transactions with the database.
● Authentication and user authorization ensure the security of the DBMS.

What is the Relational Model?

The relational model for database management is an approach to logically represent and manage
the data stored in a database. In this model, the data is organized into a collection of
two-dimensional inter-related tables, also known as relations. Each relation is a collection of
columns and rows, where the column represents the attributes of an entity and the rows (or
tuples) represent the records.

Relational Model Concepts

1. Relation: Two-dimensional table used to store a collection of data elements.


2. Tuple: Row of the relation, depicting a real-world entity.
3. Attribute/Field: Column of the relation, depicting properties that define the relation.
4. Attribute Domain: Set of pre-defined atomic values that an attribute can take i.e., it
describes the legal values that an attribute can take.
5. Degree: It is the total number of attributes present in the relation.
6. Cardinality: It specifies the number of entities involved in the relation i.e., it is the total
number of rows present in the relation. Read more about Cardinality in DBMS.
7. Relational Schema: It is the logical blueprint of the relation i.e., it describes the design
and the structure of the relation. It contains the table name, its attributes, and their types:

TABLE_NAME(ATTRIBUTE_1 TYPE_1, ATTRIBUTE_2 TYPE_2, ...)

For our Student relation example, the relational schema will be:
STUDENT(ROLL_NUMBER INTEGER, NAME VARCHAR(20), CGPA FLOAT)

8. Relational Instance: It is the collection of records present in the relation at a given time.
9. Relation Key: It is an attribute or a group of attributes that can be used to uniquely
identify an entity in a table or to determine the relationship between two tables. Relation
keys can be of 6 different types:
a) Candidate Key
b) Super Key
c) Composite Key
d) Primary Key
e) Alternate Key
f) Foreign Key

ER (Entity Relationship) Diagram in DBMS

ER model stands for an Entity-Relationship model. It is a high-level data model. This model is
used to define the data elements and relationship for a specified system.

For example, Suppose we design a school database. In this database, the student will be an
entity with attributes like address, name, id, age, etc. The address can be another entity with
attributes like city, street name, pin code, etc and there will be a relationship between them.
Entity: An entity may be any object, class, person or place. In the ER diagram, an entity can be
represented as rectangles.
Consider an organization as an example- manager, product, employee, department etc. can be
taken as an entity

Weak Entity: An entity that depends on another entity called a weak entity. The weak entity
doesn't contain any key attribute of its own. The weak entity is represented by a double
rectangle.

Attribute: The attribute is used to describe the property of an entity. Eclipse is used to represent
an attribute.
For example, id, age, contact number, name, etc. can be attributes of a student.

a. Key Attribute: The key attribute is used to represent the main characteristics of an entity. It
represents a primary key. The key attribute is represented by an ellipse with the text underlined.
b. Composite Attribute: An attribute that composed of many other attributes is known as a
composite attribute. The composite attribute is represented by an ellipse, and those ellipses are
connected with an ellipse.

c. Multivalued Attribute: An attribute can have more than one value. These attributes are
known as a multivalued attribute. The double oval is used to represent multivalued attribute.
For example, a student can have more than one phone number.

d. Derived Attribute: An attribute that can be derived from other attribute is known as a derived
attribute. It can be represented by a dashed ellipse.
For example, A person's age changes over time and can be derived from another attribute like
Date of birth.
Relationship: A relationship is used to describe the relation between entities. Diamond or
rhombus is used to represent the relationship.

Types of relationship are as follows:


a. One-to-One Relationship: When only one instance of an entity is associated with the
relationship, then it is known as one to one relationship.
For example, A female can marry to one male, and a male can marry to one female.

b. One-to-many relationship: When only one instance of the entity on the left, and more than
one instance of an entity on the right associates with the relationship then this is known as a
one-to-many relationship.
For example, Scientist can invent many inventions, but the invention is done by the only specific
scientist.

c. Many-to-one relationship: When more than one instance of the entity on the left, and only
one instance of an entity on the right associates with the relationship then it is known as a
many-to-one relationship.
For example, Student enrolls for only one course, but a course can have many students.

d. Many-to-many relationship: When more than one instance of the entity on the left, and more
than one instance of an entity on the right associates with the relationship then it is known as a
many-to-many relationship.
For example, Employee can assign by many projects and project can have many employees.
Relational Algebra

Relational algebra in DBMS is a procedural query language. Queries in relational algebra are
performed using operators. Relational Algebra is the fundamental block for modern language
SQL and modern Database Management Systems such as Oracle Database, Mircosoft SQL
Server, IBM Db2, etc.

Types of Relational Operations in DBMS


In Relational Algebra, we have two types of Operations.
● Basic Operations
● Derived Operations
Applying these operations over relations/tables will give us new relations as output.
Basic Queries in SQL

A common computer language used for maintaining and modifying data in relational databases is
called SQL, or Structured Query Language. It offers an extensive command set for inserting,
updating, deleting, and querying data. Through the creation of statements that specify operations
on tables and their contents, SQL enables users to communicate with databases. It provides
freedom in managing database structure with commands like CREATE, ALTER, and DROP, as
well as in accessing specific information with SELECT queries. Because SQL offers a consistent
and effective method of working with data, it is indispensable for anyone working with
databases, from developers and data analysts to database administrators.

SELECT Statement: The SELECT statement is the foundation of any SQL query. It retrieves
data from one or more tables in a database. After the SELECT keyword, you list the columns you
want to retrieve data from. You can also use * to select all columns.

Syntax:
SELECT column1, column2, ...
FROM table_name;

FROM Clause: The FROM clause specifies the table or tables from which you want to retrieve
the data. You list the table names after the FROM keyword. If you're selecting from multiple
tables, separate their names with commas. You can also use aliases to make your SQL code more
readable.

Syntax:
SELECT column1, column2, ...
FROM table1, table2, ...;

WHERE Clause: The SQL WHERE clause is used to filter the results obtained by the DML
statements such as SELECT, UPDATE and DELETE etc. We can retrieve the data from a single
table or multiple tables(after join operation) using the WHERE clause.

For instance, you can use the WHERE clause to retrieve details of employees of a department in
an organization, or employees earning salary above/below a certain amount, or details of students
eligible for scholarships etc. This clause basically provides the specification of which records to
be retrieved and which are to be to be neglected.

Common comparison operators used in the WHERE clause include:

● = (equal to)
● <> or != (not equal to)
● < (less than)
● > (greater than)
● <= (less than or equal to)
● >= (greater than or equal to)
● BETWEEN (between an inclusive range)
● IN (matches any of a list of values)
● LIKE (matches a pattern)
● IS NULL (checks for null values)

Syntax:

SELECT column1, column2, ...


FROM table_name
WHERE condition;

WHERE Clause with AND, OR Operators:

WHERE (condition1 OR condition2) AND condition3;

Examples of WHERE Clause:

1. SELECT * from CUSTOMERS


WHERE NAME IN ('Khilan', 'Hardik', 'Muffy');

2. SELECT * from CUSTOMERS


WHERE AGE NOT IN (25, 23, 22);

3. SELECT * FROM CUSTOMERS


WHERE NAME LIKE 'K___%';

GROUP BY Clause:

The GROUP BY statement groups rows that have the same values into summary rows, like "find
the number of customers in each country".
The GROUP BY statement is often used with aggregate functions (COUNT(), MAX(), MIN(),
SUM(), AVG()) to group the result-set by one or more columns.
Syntax:
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)

ORDER BY Clause: The ORDER BY clause sorts the result set based on one or more columns.
By default, sorting is done in ascending order, but you can specify descending order using the
DESC keyword after the column name.

Syntax:
SELECT column1, column2, ...
FROM table_name
ORDER BY column1 ASC/DESC, column2 ASC/DESC, ...;

Having Clause:

The HAVING clause was added to SQL because the WHERE keyword cannot be used with
aggregate functions. The HAVING clause places the condition in the groups defined by the
GROUP BY clause in the SELECT statement.
This SQL clause is implemented after the 'GROUP BY' clause in the 'SELECT' statement.
This clause is used in SQL because we cannot use the WHERE clause with the SQL aggregate
functions. Both WHERE and HAVING clauses are used for filtering the records in SQL queries.

Syntax:
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);

Examples of HAVING Clause:


1. SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5
ORDER BY COUNT(CustomerID) DESC;

2. SELECT COUNT(CustomerID), Country


FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;

LIMIT Clause: The LIMIT clause restricts the number of rows returned by a query. It's often
used in conjunction with the ORDER BY clause for pagination or to retrieve the top N rows.
Syntax:
SELECT column1, column2, ...
FROM table_name
LIMIT number_of_rows;

DISTINCT Keyword: The DISTINCT keyword is used to eliminate duplicate rows from the
result set, returning only unique values for the specified columns.

Syntax:
SELECT DISTINCT column1, column2, ...
FROM table_name;

SQL Joins

SQL joins combine data from two or more tables based on a related column. Types include:

1. INNER JOIN: Retrieves rows with matching values in both tables.


2. LEFT JOIN: Includes all rows from the left table and matching rows from the right table.
3. RIGHT JOIN: Includes all rows from the right table and matching rows from the left
table.
4. FULL JOIN: Retrieves all rows from both tables, displaying NULLs where no match is
found.
5. CROSS JOIN: Generates a Cartesian product, combining every row from each table.

Understanding and utilizing these join types enables effective data aggregation and analysis in
SQL.

Note on each joins:

1. INNER JOIN: Inner join returns rows that have matching values in both tables based on
the join condition. It selects only the rows for which the join condition is true.

Example: Suppose we have two tables, orders and customers, with a common column
customer_id. An inner join between these tables will return only the orders that have a
matching customer record in the customers table.
SELECT orders.order_id, orders.order_date, customers.customer_name
FROM orders
INNER JOIN customers ON orders.customer_id = customers.customer_id;

2. LEFT JOIN (or LEFT OUTER JOIN): Left join returns all rows from the left table and
the matched rows from the right table. If no match is found in the right table, NULL
values are returned for the columns from the right table.Example:

Continuing with the orders and customers tables, a left join will return all orders,
including those without a matching customer record. If a matching customer exists, it will
display the customer's name; otherwise, it will display NULL.

SELECT orders.order_id, orders.order_date, customers.customer_name


FROM orders
LEFT JOIN customers ON orders.customer_id = customers.customer_id;

3. RIGHT JOIN (or RIGHT OUTER JOIN): Right join is similar to left join but returns all
rows from the right table and the matched rows from the left table. If no match is found in
the left table, NULL values are returned for the columns from the left table.Example:

Using the same tables, a right join will return all customer records, including those
without corresponding orders. If orders exist for a customer, it will display the order
details; otherwise, it will display NULL.

SELECT orders.order_id, orders.order_date, customers.customer_name


FROM orders
RIGHT JOIN customers ON orders.customer_id = customers.customer_id;

4. FULL JOIN (or FULL OUTER JOIN): Full join returns all rows from both tables,
displaying NULL values where no match is found.Example:

With a full join, all orders and customers are included in the result set. If there's a match,
it displays the order details along with the customer name; otherwise, it shows NULL for
either side.

SELECT orders.order_id, orders.order_date, customers.customer_name


FROM orders
FULL JOIN customers ON orders.customer_id = customers.customer_id;

5. CROSS JOIN: Cross join generates the Cartesian product of the two tables, producing all
possible combinations of rows from both tables.Example:
A cross join between the orders and customers tables will produce a result set where each
order is paired with every customer, resulting in a combination of every order and every
customer.

SELECT orders.order_id, orders.order_date, customers.customer_name


FROM orders
CROSS JOIN customers;

Some Advanced Functions:

1. Subqueries:
Subqueries, also known as nested queries, are queries within another query. They allow
you to filter, sort, or manipulate data based on the results of another query. For example,
you might use a subquery to find all customers who have made purchases above a certain
threshold.

SELECT *
FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders WHERE
total_amount > 1000);

2. Aggregations:
Aggregation functions like SUM, AVG, COUNT, MIN, and MAX are used to perform
calculations on groups of rows. They are often combined with the GROUP BY clause to
group rows into subsets based on the values of one or more columns. The HAVING
clause is then used to filter the grouped rows based on aggregate values.

SELECT category, COUNT(*) AS num_products


FROM products
GROUP BY category
HAVING COUNT(*) > 5;

3. Window Functions:
Window functions allow you to perform calculations across a set of rows related to the
current row. They are used to calculate running totals, moving averages, rankings, and
more. Window functions operate on a "window" of rows defined by the OVER clause.
SELECT order_date, total_amount,
SUM(total_amount) OVER(ORDER BY order_date) AS cumulative_total
FROM orders;

4. Common Table Expressions (CTEs):


Common Table Expressions (CTEs) provide a way to define temporary result sets that
can be referenced within a query. They improve query readability and maintainability by
breaking down complex queries into smaller, more manageable parts.

WITH high_sales AS (
SELECT customer_id, SUM(amount) AS total_sales
FROM sales
GROUP BY customer_id
HAVING SUM(amount) > 10000
)
SELECT customers.customer_name, high_sales.total_sales
FROM customers
INNER JOIN high_sales ON customers.customer_id = high_sales.customer_id;

Data Manipulation Language (DML), Data Definition Language (DDL), Data


Control Language (DCL)

Data Manipulation Language (DML) is a subset of SQL (Structured Query Language) used for
managing data within database systems. DML primarily consists of commands for querying and
modifying data stored in the database tables. The main operations performed using DML are:

1. SELECT: This command is used to retrieve data from one or more tables based on
specified criteria. It allows users to query the database and extract the required
information. The SELECT statement can be customized with various clauses such as
WHERE, ORDER BY, GROUP BY, HAVING, etc., to filter, sort, and group the data as
needed.
2. INSERT: INSERT command is used to add new records or rows into a table. It allows
users to specify the values to be inserted into each column of the table. Users can insert
data into a single table or multiple tables using a single INSERT statement.
3. UPDATE: UPDATE command is used to modify existing data in a table. It allows users to
change the values of one or more columns in existing rows based on specified conditions.
The UPDATE statement requires users to specify the table name, column names to be
updated, new values, and optional conditions to identify the rows to be updated.
4. DELETE: DELETE command is used to remove one or more rows from a table based on
specified conditions. It allows users to permanently delete data from the table. Similar to
the UPDATE statement, the DELETE statement requires users to specify the table name
and optional conditions to identify the rows to be deleted.

Data Definition Language (DDL) is a subset of SQL used for defining and managing the
structure of database objects such as tables, indexes, views, and constraints. DDL commands are
used to create, modify, and delete these objects within a database. The main DDL commands
include:
1. CREATE: CREATE command is used to create new database objects such as tables,
views, indexes, and constraints. It allows users to specify the name of the object, its
structure (e.g., column names, data types, and constraints), and any other relevant
properties.
2. ALTER: ALTER command is used to modify the structure of existing database objects. It
allows users to add, modify, or drop columns, constraints, indexes, or other properties of
tables, views, or indexes.
3. DROP: DROP command is used to delete existing database objects from the database. It
allows users to remove tables, views, indexes, constraints, or other objects permanently
from the database.

Data Control Language (DCL) is a subset of SQL used for managing access to the database and
controlling permissions on database objects. DCL commands are used to grant or revoke
privileges to users and roles, allowing them to perform specific actions on database objects. The
main DCL commands include:
1. GRANT: GRANT command is used to give specific privileges or permissions to users or
roles on database objects. It allows users to specify the type of privilege (e.g., SELECT,
INSERT, UPDATE, DELETE) and the object (e.g., table, view) on which the privilege is
granted.
2. REVOKE: REVOKE command is used to revoke previously granted privileges from
users or roles. It allows users to remove specific privileges from users or roles on database
objects.

Normalisation
There are four types of normal forms that are usually used in relational databases as you can see
in the following figure:
1. 1NF: A relation is in 1NF if all its attributes have an atomic value.
2. 2NF: A relation is in 2NF if it is in 1NF and all non-key attributes are fully functional
dependent on the candidate key in DBMS.
3. 3NF: A relation is in 3NF if it is in 2NF and there is no transitive dependency.
4. BCNF: A relation is in BCNF if it is in 3NF and for every Functional Dependency, LHS
is the super key.

To understand the above-mentioned normal forms, we first need to have an understanding of the
functional dependencies.

Functional dependency is a relationship that exists between two sets of attributes of a relational
table where one set of attributes can determine the value of the other set of attributes. It is
denoted by X -> Y, where X is called a determinant and Y is called dependent.

First Normal Form (1NF)


A relation is in 1NF if every attribute is a single-valued attribute or it does not contain any
multi-valued or composite attribute, i.e., every attribute is an atomic attribute. If there is a
composite or multi-valued attribute, it violates the 1NF. To solve this, we can create a new row
for each of the values of the multi-valued attribute to convert the table into the 1NF.

Let’s take an example of a relational table <EmployeeDetail> that contains the details of the
employees of the company.

<EmployeeDetail>
Here, the Employee Phone Number is a multi-valued attribute. So, this relation is not in 1NF.

To convert this table into 1NF, we make new rows with each Employee Phone Number as a new
row as shown below:

<EmployeeDetail>

Second Normal Form (2NF)

The normalization of 1NF relations to 2NF involves the elimination of partial dependencies. A
partial dependency in DBMS exists when any non-prime attributes, i.e., an attribute not a part of
the candidate key, is not fully functionally dependent on one of the candidate keys.

For a relational table to be in second normal form, it must satisfy the following rules:

1. The table must be in first normal form.


2. It must not contain any partial dependency, i.e., all non-prime attributes are fully
functionally dependent on the primary key.

If a partial dependency exists, we can divide the table to remove the partially dependent attributes
and move them to some other table where they fit in well.

Let us take an example of the following <EmployeeProjectDetail> table to understand what is


partial dependency and how to normalize the table to the second normal form:

<EmployeeProjectDetail>
In the above table, the prime attributes of the table are Employee Code and Project ID. We have
partial dependencies in this table because Employee Name can be determined by Employee Code
and Project Name can be determined by Project ID. Thus, the above relational table violates the
rule of 2NF.

The prime attributes in DBMS are those which are part of one or more candidate keys.

To remove partial dependencies from this table and normalize it into second normal form, we can
decompose the <EmployeeProjectDetail> table into the following three tables:

<EmployeeDetail>

<EmployeeProject>

<ProjectDetail>
Thus, we’ve converted the <EmployeeProjectDetail> table into 2NF by decomposing it into
<EmployeeDetail>, <ProjectDetail> and <EmployeeProject> tables. As you can see, the above
tables satisfy the following two rules of 2NF as they are in 1NF and every non-prime attribute is
fully dependent on the primary key.

The relations in 2NF are clearly less redundant than relations in 1NF. However, the decomposed
relations may still suffer from one or more anomalies due to the transitive dependency. We will
remove the transitive dependencies in the Third Normal Form.

Third Normal Form (3NF)


The normalization of 2NF relations to 3NF involves the elimination of transitive dependencies in
DBMS.

A functional dependency X -> Z is said to be transitive if the following three functional


dependencies hold:

● X -> Y
● Y does not -> X
● Y -> Z

For a relational table to be in third normal form, it must satisfy the following rules:

1. The table must be in the second normal form.


2. No non-prime attribute is transitively dependent on the primary key.
3. For each functional dependency X -> Z at least one of the following conditions hold:
● X is a super key of the table.
● Z is a prime attribute of the table.

If a transitive dependency exists, we can divide the table to remove the transitively dependent
attributes and place them to a new table along with a copy of the determinant.

Let us take an example of the following <EmployeeDetail> table to understand what is transitive
dependency and how to normalize the table to the third normal form:
<EmployeeDetail>

The above table is not in 3NF because it has Employee Code -> Employee City transitive
dependency because:

● Employee Code -> Employee Zipcode


● Employee Zipcode -> Employee City

Also, Employee Zipcode is not a super key and Employee City is not a prime attribute.To remove
transitive dependency from this table and normalize it into the third normal form, we can
decompose the <EmployeeDetail> table into the following two tables:

<EmployeeDetail>

<EmployeeLocation>
Thus, we’ve converted the <EmployeeDetail> table into 3NF by decomposing it into
<EmployeeDetail> and <EmployeeLocation> tables as they are in 2NF and they don’t have any
transitive dependency.

The 2NF and 3NF impose some extra conditions on dependencies on candidate keys and remove
redundancy caused by that. However, there may still exist some dependencies that cause
redundancy in the database. These redundancies are removed by a more strict normal form
known as BCNF.

Boyce-Codd Normal Form (BCNF)


Boyce-Codd Normal Form(BCNF) is an advanced version of 3NF as it contains additional
constraints compared to 3NF.

For a relational table to be in Boyce-Codd normal form, it must satisfy the following rules:

1. The table must be in the third normal form.


2. For every non-trivial functional dependency X -> Y, X is the superkey of the table. That
means X cannot be a non-prime attribute if Y is a prime attribute.

A superkey is a set of one or more attributes that can uniquely identify a row in a database table.

Let us take an example of the following <EmployeeProjectLead> table to understand how to


normalize the table to the BCNF:

<EmployeeProjectLead>

The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF because
the candidate key of the above table is {Employee Code, Project ID}. For the non-trivial
functional dependency, Project Leader -> Project ID, Project ID is a prime attribute but Project
Leader is a non-prime attribute. This is not allowed in BCNF.
To convert the given table into BCNF, we decompose it into three tables:

<EmployeeProject>

<ProjectLead>

Thus, we’ve converted the <EmployeeProjectLead> table into BCNF by decomposing it into
<EmployeeProject> and <ProjectLead> tables.

Transaction
Transaction is a single logical unit of work formed by a set of operations. Operations in
Transaction:
● Read Operation - Read(A) instruction will read the value of ‘A’ from the database and will
store it in the buffer in main memory.
● Write Operation – Write(A) will write the updated value of ‘A’ from the buffer to the database.
● Active State –
● This is the first state in the life cycle of a transaction.
● A transaction is called in an active state as long as its instructions are getting executed.
● All the changes made by the transaction now are stored in the buffer in main memory.
● Partially Committed State –
● After the last instruction of the transaction has been executed, it enters into a partially
committed state.
● After entering this state, the transaction is considered to be partially committed.
● It is not considered fully committed because all the changes made by the transaction are
still stored in the buffer in main memory.
● Committed State –
● After all the changes made by the transaction have been successfully stored into the
database, it enters into a committed state.
● Now, the transaction is considered to be fully committed.
● Failed State –
● When a transaction is getting executed in the active state or partially committed state
and some failure occurs due to which it becomes impossible to continue the execution, it
enters into a failed state.

● Aborted State –
● After the transaction has failed and entered into a failed state, all the changes made by it
have to be undone.
● To undo the changes made by the transaction, it becomes necessary to roll back the
transaction.
● After the transaction has rolled back completely, it enters into an aborted state.
● Terminated State –
● This is the last state in the life cycle of a transaction.
● After entering the committed state or aborted state, the transaction finally enters into a
terminated state where its life cycle finally comes to an end.
Database Security

Security of databases refers to the array of controls, tools, and procedures designed to ensure and
safeguard confidentiality, integrity, and accessibility. This tutorial will concentrate on
confidentiality because it's a component that is most at risk in data security breaches.

Security for databases must cover and safeguard the following aspects:

○ The database contains data.

○ Database management systems (DBMS)

○ Any applications that are associated with it.

○ Physical database servers or the database server virtual, and the hardware that runs it.

○ The infrastructure for computing or network that is used to connect to the database.

Security of databases is a complicated and challenging task that requires all aspects of security
practices and technologies. This is inherently at odds with the accessibility of databases. The
more usable and accessible the database is, the more susceptible we are to threats from security.
The more vulnerable it is to attacks and threats, the more difficult it is to access and utilize.

Common Threats and Challenges

Numerous software configurations that are not correct, weaknesses, or patterns of carelessness or
abuse can lead to a breach of security. Here are some of the most prevalent kinds of reasons for
security attacks and the reasons.

Insider Dangers:

An insider threat can be an attack on security from any three sources having an access privilege
to the database.

○ A malicious insider who wants to cause harm

○ An insider who is negligent and makes mistakes that expose the database to attack.
vulnerable to attacks

○ An infiltrator is an outsider who acquires credentials by using a method like phishing or


accessing the database of credential information in the database itself.
Insider dangers are among the most frequent sources of security breaches to databases. They
often occur as a consequence of the inability of employees to have access to privileged user
credentials.

Human Error:

The unintentional mistakes, weak passwords or sharing passwords, and other negligent or
uninformed behaviors of users remain the root causes of almost half (49 percent) of all data
security breaches.

Database Software Vulnerabilities can be Exploited:

Hackers earn their money by identifying and exploiting vulnerabilities in software such as
database management software. The major database software companies and open-source
databases management platforms release regular security patches to fix these weaknesses.
However, failing to implement the patches on time could increase the risk of being hacked.

SQL/NoSQL Injection Attacks:

A specific threat to databases is the infusion of untrue SQL as well as other non-SQL string
attacks in queries for databases delivered by web-based apps and HTTP headers. Companies that
do not follow the safe coding practices for web applications and conduct regular vulnerability
tests are susceptible to attacks using these.

Buffer Overflow is a way to Exploit Buffers:

Buffer overflow happens when a program seeks to copy more data into the memory block with a
certain length than it can accommodate. The attackers may make use of the extra data, which is
stored in adjacent memory addresses, to establish a basis for they can begin attacks.

DDoS (DoS/DDoS) Attacks:

In a denial-of-service (DoS) attack in which the attacker overwhelms the targeted server -- in this
case, the database server with such a large volume of requests that the server is unable to meet no
longer legitimate requests made by actual users. In most cases, the server is unstable or even fails
to function.

Malware:

Malware is software designed to exploit vulnerabilities or cause harm to databases. Malware can
be accessed via any device that connects to the databases network.

Attacks on Backups:
Companies that do not protect backup data using the same rigorous controls employed to protect
databases themselves are at risk of cyberattacks on backups.

The following factors amplify the threats:

○ Data volumes are growing: Data capture, storage, and processing continue to increase
exponentially in almost all organizations. Any tools or methods must be highly flexible to
meet current as well as far-off needs.

○ The infrastructure is sprawling: Network environments are becoming more


complicated, especially as companies shift their workloads into multiple clouds and
hybrid cloud architectures and make the selection of deployment, management, and
administration of security solutions more difficult.

○ More stringent requirements for regulatory compliance: The worldwide regulatory


compliance landscape continues to increase by complexity. This makes the compliance of
every mandate more challenging.

Transaction

A transaction is a logical unit of work that consists of one or more operations (such as
reads or writes) on a database. Transactions ensure that a set of operations either succeed
completely or fail completely, maintaining the database's consistency and integrity.
Problems due to Concurrency:
1. Lost Updates: Occur when multiple transactions modify the same data concurrently,
leading to one transaction's changes being overwritten or lost.
2. Dirty Reads: Happen when a transaction reads data that has been modified by another
uncommitted transaction, potentially leading to incorrect or inconsistent results.
3. Non-Repeatable Reads: Occur when a transaction reads the same data multiple times
but gets different results each time due to concurrent modifications by other transactions.
4. Phantom Reads: Involve a transaction seeing additional rows (phantoms) in a result set
due to concurrent inserts or deletions by other transactions.
5. Deadlocks: Arise when two or more transactions wait indefinitely for resources held by
each other, halting progress and requiring intervention.
6. Concurrency Control Overhead: Implementing concurrency control mechanisms adds
overhead in terms of system resources and performance, impacting throughput.

Concurrency Control:
Concurrency control mechanisms manage concurrent access to shared resources, such as
database records, by multiple transactions. These mechanisms prevent conflicts, maintain data
consistency, and ensure that transactions execute correctly in a multi-user environment.
Concurrency Control Techniques:
1. Locking: Transactions acquire locks (e.g., read locks, write locks) on data to control
concurrent access. Locks prevent conflicting operations from executing simultaneously,
ensuring data integrity.
2. Two-Phase Locking (2PL): Ensures that transactions obtain all necessary locks before
execution and release them only after completing the transaction. This prevents
conflicting transactions from accessing the same data concurrently.
3. Timestamp Ordering: Assigns timestamps to transactions and uses them to determine
the order of execution. Transactions are executed based on their timestamps to prevent
conflicts and ensure consistency.
4. Optimistic Concurrency Control: Transactions proceed optimistically without
acquiring locks initially but validate changes before committing. If conflicts are detected
during validation, appropriate actions are taken to maintain consistency.

Recoverability of Schedules:
Recoverability is a property of database systems that ensures that, in the event of a failure or
error, the system can recover the database to a consistent state. Recoverability guarantees that all
committed transactions are durable and that their effects are permanently stored in the database,
while the effects of uncommitted transactions are undone to maintain data consistency.
The recoverability property is enforced through the use of transaction logs, which record all
changes made to the database during transaction processing. When a failure occurs, the system
uses the log to recover the database to a consistent state, which involves either undoing the
effects of uncommitted transactions or redoing the effects of committed transactions.

There are several levels of recoverability that can be supported by a database system:
No-undo logging: This level of recoverability only guarantees that committed transactions are
durable, but does not provide the ability to undo the effects of uncommitted transactions.
Undo logging: This level of recoverability provides the ability to undo the effects of
uncommitted transactions but may result in the loss of updates made by committed transactions
that occur after the failed transaction.
Redo logging: This level of recoverability provides the ability to redo the effects of committed
transactions, ensuring that all committed updates are durable and can be recovered in the event of
failure.
Undo-redo logging: This level of recoverability provides both undo and redo capabilities,
ensuring that the system can recover to a consistent state regardless of whether a transaction has
been committed or not.
In addition to these levels of recoverability, database systems may also use techniques such as
checkpointing and shadow paging to improve recovery performance and reduce the overhead
associated with logging.
Overall, recoverability is a crucial property of database systems, as it ensures that data is
consistent and durable even in the event of failures or errors. It is important for database
administrators to understand the level of recoverability provided by their system and to configure
it appropriately to meet their application’s requirements.
Schedules Based on Recoverability
1. Recoverable Schedule:
● A schedule is recoverable if, for every pair of transactions T1 and T2 where T2
reads a data item previously written by T1, T1 commits before T2 starts its read
operation.
● In simpler terms, a recoverable schedule ensures that a transaction reading data
modified by another transaction only reads the changes after the modifying
transaction commits. This prevents dirty reads and ensures consistency during
recovery.
2. Cascadeless Schedule:
● A schedule is cascadeless if, for every pair of transactions T1 and T2 where T2
writes a data item previously read by T1, T1 commits before T2 starts its write
operation.
● Cascadeless schedules prevent cascading rollbacks, where a failure in one
transaction causes subsequent transactions to roll back, potentially leading to
inconsistencies and lost updates.
3. Strict Schedules:
● A strict schedule in databases ensures that conflicting operations on the same data
never occur concurrently. This approach maintains data consistency and integrity
but can reduce concurrency and impact performance.

Serializability of Schedules
1. Conflict Serializability:
● Definition: Conflict serializability is a concept in database concurrency control
that ensures that the final result of concurrent transactions is equivalent to some
serial execution of those transactions.
● Conflicts: Transactions conflict if they access the same data item and at least one
of them performs a write operation (write-write conflict) or if one transaction
reads a data item while another writes to it (read-write conflict).
● Serializable Schedules: A schedule of transactions is conflict serializable if it can
be transformed into a serial schedule by swapping non-conflicting operations
while maintaining the order of conflicting operations.
● Example: Consider two transactions T1 and T2:
● T1: Read X, Write Y
● T2: Read Y, Write X
● If T1 and T2 run concurrently in a way that their operations don't overlap,
the resulting schedule is conflict serializable because their operations can
be reordered into a serial schedule without conflicts.
2. View Serializability:
● Definition: View serializability is another criterion for determining whether a
schedule of transactions is equivalent to some serial schedule. It focuses on the
read and write operations of transactions.
● Equivalent Views: Two schedules are view equivalent if they produce the same
final database state when starting from the same initial state and executing the
same transactions, regardless of the actual order of operations within transactions.
● Example: Consider two schedules S1 and S2:
● S1: T1 reads A, T2 writes B, T1 writes C
● S2: T2 writes B, T1 reads A, T1 writes C
● Even though the order of operations within transactions differs in S1 and
S2, they are view equivalent if both result in the same final database state.

Indexing in DBMS

Indexing is used to quickly retrieve particular data from the database. Formally we can define
Indexing as a technique that uses data structures to optimize the searching time of a database
query in DBMS. Indexing reduces the number of disks required to access a particular data by
internally creating an index table.
Indexing is achieved by creating Index-table or Index.

Index usually consists of two columns which are a key-value pair. The two columns of the index
table(i.e., the key-value pair) contain copies of selected columns of the tabular data of the
database.
Here, Search Key contains the copy of the Primary Key or the Candidate Key of the database
table. Generally, we store the selected Primary or Candidate keys in a sorted manner so that we
can reduce the overall query time or search time(from linear to binary).

Indexing Attributes
Let's discuss the various indexing attributes:

Standard (B-tree) and Bitmap:

B-tree-indexing is one of the most popular and commonly used indexing techniques. B-tree in
DBMS is a type of tree data structure that contains 2 things namely: Index Key and its
corresponding disk address.
Index Key refers to a certain disk address and that disk further contains rows or tuples of data.
On the other hand, Bitmap indexing uses strings to store the address of the tuples or rows. A
bitmap is a mapping from one system to the other such as integers to bits.
Bitmap has an advantage over B-tres as bitmap performs faster retrieval of certain data (Bitmap
is made according to a certain data, hence retrieves faster). Bitmaps are also more compact than
B-trees.
There is a drawback with bit mapping, bit mapping requires more overhead during tuple
operations on the table. Hence, bit maps are mainly used in data warehouse environments.
Example - We want to store this three-column table in the database.
The B-tree representation will be like this:

Note: Oracle Database uses Bitmap and B-trees.

Ascending and Descending:


As we have discussed above, columns of the index are stored in some sorted manner. Generally,
we store these Search Keys in ascending order. These sorted keys allow us to search data the data
fastly. We can change the sort order from ascending to descending or something different
according to the most frequent queries on the database.

Syntax:
Lets see the syntax to store indexing in descending order:

By default Sorting Order:


● Character Data: Sorted by ASCII values of the characters.
● Numeric Data: Smallest to largest numbers.
● Date: Earliest date to the latest date.
Column and Functional:
Generally, we prepare the index table with certain column values of the actual database but
sometimes we can also use predefined SQL functions like UPPER() or LOWER() or MAX(), etc.
to prepare the Search Keys.
Example - We can convert all values in a column to uppercase and stored these results in the
index.

Syntax:

Note: The index table formed used columns values are also termed as Column Index or Column
Index-table.

Single-Column and Concatenated


We can create a single-column index table or multi-column index table. Concatenated indexes
are made according to certain WHERE clauses(WHERE clause related to the most frequent SQL
Queries), hence making the searching or data retrieval faster.

Example - Let us take an example of a multi-column index table:

We can use the primary key to create multiple index tables such as indexing based on year
(grouping years) or indexing based on model-name etc. This multi-table indexing will help in
getting specific query results faster.
Non-Partitioned and Partitioned
As we know index points to a certain table or block of data but sometimes the data itself is
partitioned in a certain manner, so we need to partition the index table as well. Generally, we use
the same table partition schema for the partition of the index table which is known as the Local
Partition Index.

Example - Suppose we have a table namely a student table. If the student table is partitioned
according to the roll number(primary key) then the index table of the student table should be
partitioned according to roll number as well. This type of partition will help in the grouping of
similar data and faster query results.

Types of Indexes:

According to the attributes defined above, we divide indexing into three types:

Single Level Indexing: Index of a book contains topic names along with the page number
similarly the index table of the database contains keys and their corresponding block address.
Single Level Indexing is further divided into three categories:

1. Primary Indexing: The indexing or the index table created using Primary keys is known
as Primary Indexing.
Example:

Characteristics of Primary Indexing:


● Search Keys are unique.
● Search Keys are in sorted order.
● Search Keys cannot be null as it points to a block of data.
● Fast and Efficient Searching.

2. Secondary Indexing: It is a two-level indexing technique used to reduce the mapping


size of the primary index. The secondary index points to a certain location where the data
is to be found but the actual data is not sorted like in the primary indexing. Secondary
Indexing is also known as non-clustered Indexing.
Example:

Characteristics of Secondary Indexing:


● Search Keys are Candidate Keys.
● Search Keys are sorted but actual data may or may not be sorted.
● Requires more time than primary indexing.
● Search Keys cannot be null.
● Faster than clustered indexing but slower than primary indexing.

3. Cluster Indexing: Clustered Indexing is used when there are multiple related records
found at one place. It is defined on ordered data. The important thing to note here is that
the index table of clustered indexing is created using non-key values which may or may
not be unique. To achieve faster retrieval, we group columns having similar
characteristics. The indexes are created using these groups and this process is known as
Clustering Index.
Example:
Characteristics of Clustered Indexing:
● Search Keys are non-key values.
● Search Keys are sorted.
● Search Keys cannot be null.
● Search Keys may or may not be unique.
● Requires extra work to create indexing.

Ordered Indexing:
Ordered indexing is the traditional way of storing that gives fast retrieval. The indices are stored
in a sorted manner hence it is also known as ordered indices.
Ordered Indexing is further divided into two categories:
1. Dense Indexing: In dense indexing, the index table contains records for every search key
value of the database.
Example:

2. Sparse Indexing: Sparse indexing consumes lesser space than dense indexing, but it is a
bit slower as well. We do not include a search key for every record despite that we store a
Search key that points to a block.
Example:
Multi-Level Indexing: Since the index table is stored in the main memory, single-level indexing
for a huge amount of data requires a lot of memory space. Hence, multilevel indexing was
introduced in which we divide the main data block into smaller blocks. This makes the outer
block of the index table small enough to be stored in the main memory.
Example:

We use the B+ Tree data structure for multilevel indexing. The leaf nodes of the B+ tree contain
the actual data pointers. The leaf nodes are themselves in the form of a linked list. This linked list
representation helps in both sequential and random access.
.
Advantages of Indexing
● Indexing helps in faster query results or quick data retrieval.
● Indexing helps in faster sorting and grouping of records
● Some Indexing uses sorted and unique keys which helps to retrieve sorted queries even
faster.
● Index tables are smaller in size so require lesser memory.
● As Index tables are smaller in size, they are stored in the main memory.
● Since CPU speed and secondary memory speed have a large difference, the CPU uses this
main memory index table to bridge the gap of speeds.
● Indexing helps in better CPU utilization and better performance.

You might also like