0% found this document useful (0 votes)
2 views

DBMS Complete Syllabus

A database is a structured collection of data stored electronically, while a Database Management System (DBMS) is software that facilitates efficient data storage, retrieval, and management. The document outlines various aspects of databases, including data independence, instances and schemas, types of databases, and the roles of database administrators. It also explains the Entity-Relationship (ER) model, detailing entities, attributes, relationships, and the conversion from ER diagrams to relational models.

Uploaded by

sorure
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DBMS Complete Syllabus

A database is a structured collection of data stored electronically, while a Database Management System (DBMS) is software that facilitates efficient data storage, retrieval, and management. The document outlines various aspects of databases, including data independence, instances and schemas, types of databases, and the roles of database administrators. It also explains the Entity-Relationship (ER) model, detailing entities, attributes, relationships, and the conversion from ER diagrams to relational models.

Uploaded by

sorure
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

What is Data Base?

• A database is a structured collection of data,


facilitating easy access, management, and updates.,
generally stored and accessed electronically from a
computer system.

What is Data Base Management System?

A DBMS is software facilitating efficient data storage, retrieval, and management in


databases.

Ensures data safety and integrity, while offering accessibility and concurrency control.

Supports functions like data querying, reporting, and analytics for informed decision- making.

Aspect File system Database Management


System

Data Access Slower data retrieval due to Structured querying


unstructured querying capabilities allow for quicker
capabilities. data access.

Data Isolation Challenges in correlating Facilitates data integration,


data across separate files reducing data isolation
leading to data isolation. issues.

Data Integrity Risk of inadvertent data Features to prevent


alterations or deletions unauthorized data
creating integrity problems alterations, maintaining
integrity.

Atomicity Problem Potential for data Supports transaction


inconsistency due to properties like atomicity,
incomplete operations, ensuring operations are
leading to atomicity either completed fully or not
problems. at all.

Concurrent Access Conflicts and Advanced concurrency


Anomalies inconsistencies from controls to manage multiple
simultaneous data users accessing the
access/modifications, database simultaneously,
causing concurrent access reducing anomalies.
anomalies.

View of Data Base (Data Abstraction)

● Physical Level

● Logical Level/ Conceptual Level

● View Level

Physical Level: The internal schema details data storage and


access on hardware, featuring the lowest level of data
abstraction with complex structures, predominantly managed
by the database administrator.

Logical Level/ Conceptual Level: Above the physical level,


this level showcases
data as entity sets and their relationships, detailing the types
and connections between stored data in the database.

View Level: This is the pinnacle of data abstraction, displaying only a portion of the entire
database focusing on user-interest areas. It can represent multiple views of the same data,
allowing users to access information through various applications from the database

Data independence

Data independence is defined as the capacity to change the schema at one level of a
database system without having to change the schema at the next higher level.

Types of data independence :

Physical data independence: a. Physical data independence is the ability to modify

internal schema without changing the conceptual schema. b. Modification at the physical
level is occasionally necessary in order to improve performance. c. It refers to the immunity
of the conceptual schema to change in the internal schema. d.
Examples of physical data independence are reorganizations of files, adding a new access
path or modifying indexes, etc.

Logical data independence: a. Logical data independence is the ability to modify the

conceptual schema without having to change the external schemas or application programs.
b. It refers to the immunity of the external model to changes in the conceptual model. c.
Examples of logical data independence are addition/removal of entities.

Instance and Schema

Instance of the Database:


The collection of information stored in the database at a specific moment is known as an
instance of the database. It is a snapshot of the database that contains live data at that
moment, showing the current state of all records and transactions.

Database Schema:
The database schema refers to the overall design of the database, illustrating the logical
structure and organization of data. It defines how data is organized and how relationships
between data are handled, essentially serving as the blueprint for how the database is
constructed.

Aspect OLAP (Online Analytics OLTP (Online Transaction


Processing) Processing)

Primary Function Designed for complex data Handles daily transactional


analysis and reporting. data processing.

Database Design Star or snowflake schema, Normalized schema,


optimizing for read optimizing for write
operations operations

Query Complexity Complex queries involving Simple and standard


aggregations and queries focusing on CRUD
computations across multiple operations (Create, Read,
dimensions. Update, Delete).

Data Volume Deals with large volumes of Processes a high number


data for historical analysis of small transactions.

Response Time Slower response time due to Fast response time to


complex queries. support high transaction
rates.

Types of Data Base

Commercial Database: Predominantly used in the business sector to handle


large volumes of transactions and customer data. A CRM system like Salesforce which
handles large volumes of customer data and transactions.

Multimedia Database: Stores data types such as images, audio, and video files,

facilitating the management and retrieval of multimedia content. A digital asset management
system like Adobe Experience Manager that facilitates the storage and retrieval of
multimedia content.

Deductive Database: Utilizes logic programming to derive information from

data stored in a database, allowing for more complex and analytical queries. A database
using Datalog (a query language) which allows for complex logical queries and information
derivation.

Temporal Database: Keeps track of changing data over time, allowing for queries
concerning time-based data. A historical trading database in the financial sector which keeps
track of stock prices over time.

Geological Information System (GIS): Stores, organizes, and analyzes geographical data,
aiding in spatial analysis and mapping projects. A system like ArcGIS which enables the
storage, analysis, and visualization of geographical data.

DBA(Database Administrator

Database administrators hold authority over data and the programs facilitating data access.
Their roles/functions are:

● Schema Definition: DBA outlines the original database schema.

Achieved through writing definitions translated to permanent labels in the data dictionary by
the DDL compiler.

● Storage Structure and Access Method Definition: Responsible for forming


appropriate storage structures and access methods.

Achieved through writing definitions translated by the data storage and definition language
compiler.

● Schema and Physical Organization Modification: Involves altering the database


schema or physical storage organization. Changes are implemented by writing
definitions that modify the relevant internal system tables.

● Granting of Authorization for Data Access: DBA grants varied types of data
access authorization to different database users.
● Integrity Constraint Specification: DBA implements and maintains integrity
constraints to ensure data accuracy and consistency.

Query Processor: is the component of a DBMS that interprets and executes user
queries. It comprises several sub-components including:

1. DML Compiler: Processes Data Manipulation Language (DML) statements into low-level
instructions that can be executed.

2. DDL Interpreter: Processes Data Definition Language (DDL) statements into metadata
tables.

3. Embedded DML Pre-compiler: Translates DML statements embedded in application


programs into procedural calls.mer Query

4. Query Optimizer: Determines the most efficient way to execute a query by evaluating
different query plans.

2.Storage Manager: Also known as the Database Control System, it is responsible for
managing the data stored in the database, ensuring its consistency and integrity. It includes
the following subcomponents:

1. Authorization Manager: Manages access controls and privileges.

2. Integrity Manager: Ensures that data modifications adhere to integrity constraints.

3. Transaction Manager: Manages concurrent access to the database and maintains


database consistency during transactions.

4. File Manager: Manages file space and data structures representing information in the
database.

5. Buffer Manager: Manages data cache and data transfer between main memory and
secondary storage.
3.Disk Storage: Represents the storage aspect of a DBMS, encompassing the
following components:

1. Data Files: Files where the actual data is stored.

2. Data Dictionary: Repository containing information about the structure and


characteristics of database objects.

3. Indices: Data structures that facilitate faster data retrieval.

Database Management System (DBMS) consists of three primary components:

1. Internal Level: Concerns the physical storage of data in databases, overseeing data
storage on hardware devices, and managing low-level aspects like data compression and
indexing.

2. Conceptual Level: Represents the logical layout of the database, detailing the schema
with tables and attributes and their interrelations. It's independent of specific DBMS
implementations, focusing on organizing and connecting data elements.

3. External Level: Embodies the user interface of the database, facilitating data access and
interaction through user-friendly views and interfaces tailored to various user groups.

ER Diagram

Developed by Dr. Peter Chen in 1976, this conceptual level method,


grounded in real-world perceptions, facilitates diagrammatic data
representation, simplifying comprehension for non-technical users.

The E-R data model, central to database design, encapsulates


entities and their attributes within an enterprise schema, serving as a
clear, standardized tool for translating real-world enterprise
interactions into a conceptual schema.
ENTITY

An entity is a thing or an object in the real world that is distinguishable from other object
based on the values of the attributes it possesses.

An entity may be concrete, such as a person or a book, or it may be abstract, such as a


course, ◉ a course offering, or a flight reservation.
Types of Entity

•TangibleEntities which physically exist in real world. E.g. - Car, Pen, locker
• Intangible - Entities which exist logically. E.g. - Account, video.

In ER diagram we cannot represent an entity, as entity is an instant not schema, and ER


diagram is designed to understand schema

• In a relational model entity is represented by a row or a tuple or a record in a table.

Entity SET- Collection of same type of entities that share the same properties or attributes.

In an ER diagram an entity set is represented by a rectangle

In a relational model it is represented by a separate table.


Attributes

Attributes are the units defines and describe


properties and characteristics of entities.

Attributes are the descriptive properties possessed


by each member of an entity set.

for each attribute there is a set of permitted values


called domain.

in an ER diagrain attributes are represented by ellipse or oval connected to rectangle.

• While in a relational model they are represented by independent column. e.g. Instructor (ID,
name, salary, dept_name)

Type of Attributes

Single valued- Attributes having single value at any instance of time for an entity. E.g. -
Aadhar no, dob.

Multivalued - Attributes which can have more than one value for an entity at same time. E.g.

Phone no, email, address.

A multivalued attribute is represented by a double ellipse in an ER diagram and by an


independent table in a relational model.

Separate table for each multivalued attribute, by taking mva and pk of main table as fk in
new table
Simple Attributes which cannot be divided further into sub parts. E.g. Age

Composite - Attributes which can be further divided into sub parts, as simple attributes. A
composite attribute is represented by an ellipse connected to an ellipse and in a relational
model by a separate column.
Stored - Main attributes whose value is permanently stored in database. E.g. date_of_birth

Derived - The value of these types of attributes can be derived from values of other
Attributes. E.g. - Age attribute can be derived from date_of_birth and Date attribute.

Descriptive Attribute - Attribute of relationship is called descriptive attribute.

An attribute takes a null value when an entity does not have a value for it. The null value
may indicate "not applicable" - that is, that the value does not exist for the entity.
Relationship / Association

• Is an association between two or more entities of same or different entity set.

• In ER diagram we cannot represent individual relationship as it is an instance or data.

In an ER diagram it is represented by a diamond, while in relational model sometimes


through foreign key and other time by a separate table.

Every relationship type has three components.

● Name

● Degree

● Structural constraints (cardinalities ratios, participation)

NAME - every relation must have a unique name.

Degree of a relationship/relationship set

● Means number of entities set(relations/tables) associated(participate) in the


relationship set.
● Most of the relationship sets in a data base system are binary.
● Occasionally however relationship sets involve more than two entity sets.
● Logically, we can associate any number of entity set in a relationship called N-ary
Relationship.
Unary Relationship - One single entity set participate in a Relationship, means two entities
of the same entity set are related to each other.

• These are also called as self -referential Relationship set.

E.g.- A member in a team maybe supervisor of another member in team.

Ternary Relationship - When three entities participate in a Relationship. E.g. The


University might need to record which teachers taught which subjects in which courses.

Quaternary Relationship - When four entities participate in a Relationship.

N-ary relationship - where n number of entity set are associated.

• But the most common relationships in ER models are Binary.


Structural constraints (Cardinalities Ratios, Participation)

An E-R enterprise schema may define certain constraints to which the contents of a
database must conform.

MAPPING CARDINALITIES / CARNINALITY RATIOS

Express the number of entities to which another entity can be associated via a relationship
set. Four possible categories are-

One to One (1:1) Relationship.

• One to Many (1: M) Relationship.

Many to One (M: 1) Relationship.

Many to Many (M: N) Relationship.

One to One (1:1, Relationship - An entity in A is associated with at most one entity in B,
and an entity in B is associated with at most one entity in A.

E.g.- The directed line from relationship set advisor to both entities set indicates that 'an
instructor may advise at most one student, and a student may have at most one advisor'.

One to Many (1. M) Relationship - An entity in A is associated with any number (zero or
more) of entities in B. An entity in B, however, can be associated with at most one entity in A.

E.g.- This indicates that an instructor may advise many students, but a student may have at
most one advisor.

Many to One (M: 1) Relationship - An entity in A is associated with at most one entity in B.
An entity in B, however, can be associated with any number (zero or more) of entities in A.

E.g.- This indicates that student may have many instructors but an instructor can advise at
most one student.
Many to Many M.N) Relationship - An entity in A is associated with any number (zero or
more) of entities in B, and an entity in B is associated with any number (zero or more) of
entities in A

E.g.- This indicates a student may have many advisors and an inst.ustor may advise many
students.

PARTICIPATION CONSTRAINTS- It defines participations of entities of an entity type in a


relationship.

• Partial participation

• Total Participation.

STRONG AND WEAK ENTITY SET

• An entity set is called strong entity set, if it has a primary key, all the tuples in the set are
distinguishable by that key.

• An entity set that does not process sufficient attributes to form a primary key is called a
weak entity set.

It contains discriminator attributes (partial key) which contain partial information about the
entity set, but it is not sufficient enough to identify each tuple uniquely.

Represented by double rectangle.


Cunversion From Conversion from ER Diagram Relation Model

•Entity Set

● Convert every strong, weak entity set into a separate table. In weak entity set we
make it dependent onto one strong entity set (identifying or owner entity set).

•Relationship

● If Unary: No separate table is required, add a new column as fk which refer the pk of
the same table.

● if 1:1 No separate table is required, take pk of one side and put it as fk on other side,
priority must be given to the side having total participation.

● if 1:n or n:1 No separate table is required, modify n side by taking pk of 1 side a


foreign key on n side.

● If m-n Separate table is required take pk of both table and declare their combination
as a pk of new table

● (3 or More) Take the pk of all participating entity sets as fk and declare their
combinations as pk in the new table.

•Attributes

● Multivalued-A separate table must be taken for all multivalued attributes, where we
take pk of the main table as fk and declare combination of fk and multivalued
attribute are pk in the new table.

● Composite Attributes-A separate column must be taken for all simple attributes of the
composite attribute.

Generalization

Involves merging two lower-level entities to create a higher-level entity.

A bottom-up approach that builds complexity from simpler components.

Highlights similarities among lower-level entity sets while hiding differences.

Leads to a simplified, structured data representation, aiding in database design and querying
processes.
Specialization

A process where a higher-level entity is broken down into more specific, lower- level entities.

This top-down approach delineates complexity into simpler components.

Acts as the converse of the generalization process, focusing on differentiating properties


rather than similarities.

Aggregation

A concept wherein relationships are abstracted to form higher-level entities, enabling a more
organized representation of complex relationships.

ADVANTAGES OF E-R DIGRAM

● Constructs used in the ER diagram can easily be transformed into relational tables.

It is simple and easy to understand with minimum training.

DISADVANTAGE OF E-R DIGRAM

● Loss of information content


● Limited constraints representation
● It is overly complex for small projects

RELATIONAL DATABASE MANAGEMENT SYSTEM

A relational database management system (RDBMS), conceptualized by Edgar F. Codd in


1970, serves as the foundation for most contemporary commercial and open-source
database applications.

Central to its design is the utilization of tables for data storage,

• where it maintains and enforces specific data relationships, marking a


significant evolution in database design.

BASICS OF RDBMS

Domain (set of permissible value in particular column) is a set of atomic values.

By atomic we mean that each value in the domain is indivisible as far as the formal relational
model is concerned.

A common method of specifying a domain is to specify a data type from which the data
values forming the domain are drawn.

E.g. Names: The set of character strings that represent names of persons.

Table (Relation) - A Relation is a set of tuples/rows/entities/records.

Tuple - Each row of a Relation/Table is called Tuple.


Arity/Degree - No. of columns/attributes of a Relation. E.g. - Arity is 5 in Table Student.

Cardinality - No of rows/tuples/record of a Relational instance. E.g. - Cardinality is 4 in table

• Student.

Properties of Relational tables

1. Cells contains atomic values


2. Values in a column are of the same kind
3. Each row is unique
4. No two tables can have the same name in a relational schema.
5. Each column has a unique name
6. The sequence of rows is insignificant
7. The sequence of columns is insignificant.

Propiems in relational database

Update Anomalies- Anomalies that cause redundant work to be done during insertion into
and Modification of a relation and that may cause accidental loss of information during a
deletion from a relation

● Insertion Anomalies
● Modification Anomalies
● Deletion Anomalies

Insertion anomalies - An independent piece of information cannot be recorded into a


relation unless an irrelevant information must be inserted together at the same time.

Deletion Anomalies -. The deletion of a piece of information unintentionally removes other


information.
Purpose of Normalization

• Normalization may be simply defined as refinement process. Which includes creating


tables and establishing relationships between those tables according to rules designed both
to protect data and make the database more flexible by eliminating two factors;

• Redundancy
• Inconsistent Dependency

With out normalization data base system may be inaccurate, slow and inefficient and they
might not produce the data we expect. A series of
normal form tests that can be carried out on
individual relation schemas so that the relational
database can be normalized to any desired
degree.

● 1NF>>>2NF>>3NF>>BCNF

Conclusion

Like every paragraph must have a single idea similarly every table must have a single idea
and if a table contains more than one idea then that table must be decomposed until each
table contains only one idea.

We need some tools to approach this decomposition or normalization on large database


which contains a number of table, and that tool is functional dependencies.
FUNCTIONAL DEPENDENCY

FUNCTIONAL DEPENDENCY

• A formal tool for analysis of relational schemas.


• In a Relation R, if 'a' R AND 'B' R, then attribute or a Set of attribute 'a'
Functionally derives an attribute or set of attributes 'ẞ', iff each 'a'
value is associated with precisely one 'ẞ' value.
For all pairs of tuples t₁ and t₂ in R such that

• If T₁[a] = T₂[α]
• Then, T₁[B] = T₂[B]

Q. Consider the following relation instance, which of the following


dependency doesn't hold

A) A → b
B) BC → A
C) B → C
D) AC → B
Trivial Functional dependency - If B is a subset of dependency a → ẞ
will always hold. a, then the functional

जिसका होना न होना बराबर हो

ATTRIBUTES CLOSURE/CLOSURE ON ATTRIBUTE SET/ CLOSURE


SET OF ATTRIBUTES

• Attribute closure of an attribute set can be defined as set of attributes


which can be functionally determined from F.

• DENOTED BY F+

ARIVISTRONG'S AXIOMS

1. An axiom or postulate is a statement that is taken to be true, to serve


as a premise or starting point for further reasoning and arguments.

2. Armstrong's axioms are a set of axioms (or, more precisely, inference


rules) used to infer all the functional dependencies on a relational
database. They were developed by William W. Armstrong in his 1974
paper.
3. The axioms are sound in generating only functional
dependencies in the closure of a set of functional
dependencies (denoted as F+) when applied to that set
(denoted as F).

Armstrong Axioms

Reflexivity: If Y is a subset of X, then X → Y

Augmentation: If X → Y, then XZ → YZ

• Transitivity: If X → Y and Y→ Z, then X → Z

From these rules, we can derive these secondary rules-

Union: If X → Y and X → Z, then X → YZ

• Decomposition: If X → YZ, then X → Y and X → Z

•Pseudo transitivity: If X → Y and WY → Z, then WX → Z

•Composition: If X → Y and Z → W, then XZ → YW

Why Armstrong axioms refers to the Sound and Coniplete

By sound, we mean that given a set of functional dependencies F


specified on a relation schema R, any dependency that we can infer
from F by using the primary rules of Armstrong axioms holds in every
relation state r of R that satisfies the dependencies in F.

By complete, we mean that using primary rules of Armstrong axioms


repeatedly to infer dependencies until no more dependencies can be
inferred results in the complete set of all possible dependencies that can
be inferred from F.

Equivalence of Two FD sets-

Two FD sets F₁ and F₂ are equivalent if -

F1+ = F2+

Or

F₁ 드 F₂+ and F₂ 드 F₁+

Consider the following set of fd R(ACDEH)

To find the MINIMAL COVER/CANONICAL COVER/IRREDUCIBLE


SET

A canonical cover (also known as a minimal cover) for a set of functional


dependencies in a database is a minimal set of functional dependencies
that is equivalent to the original set, but with redundant dependencies
and extraneous attributes removed. It is used in the normalization
process of database design to simplify the set of functional
dependencies and to find a good set of relations.

There may be any following type of redundancy in the set of functional


dependencies: -

Complete production may be Redundant.

One or more than one attributes may be redundant on right hand side of
a production.

One or more than one attributes may be redundant on Left hand side of
a production.

QR(ABCD)

A→B

C→B

D→ ABC

AC → D
R(A,B,C)

A→B

B→C

A→C

AB → B

AB→ C

AC → B

KEY

Super key

Set of attributes using which we can identify each tuple uniquely is called
Super key.

Let X be a set of attributes in a Relation R, if X+(Closure of X)


determines all attributes of R then X is said to be Super key of R.

There should be at least one Super key in every relation.


Candidate key

Minimal set of attributes using which we can identify each tuple uniquely
is called Candidate key. A super key is called candidate key if it's No
proper subset is a super key. Also called as MINIMAL SUPER KEY.

There should be at least one candidate key.

Prime attribute - Attributes that are member of at least one candidate


Keys are called Prime attributes.

Primary key

One of the candidate keys is selected by database administrator as a


Primary means to identify tuple is called primary Key. Primary Key
attribute are not allowed to have Null values. Exactly one Primary Key
per table in RDMS.

Candidate key which are not chosen as primary key is alternate key.

Foreign Keys

A foreign key is a column or group of columns in a relational database


table that refers the primary key of the same table or some other table to
represent relationship.

The concept of referential integrity is derived from foreign key theory.


Composite key - Composite key is a key composed of more than one
column sometimes it is also known as concatenated key.

Secondary key - Secondary key is a key used to speed up the search


and retrieval contrary to primary key, a secondary key does not
necessary contain unique values.
NORMAL FORMS
FIRST NORMAL FORM

• 1NF is the initial step of database normalization.

•Implications of first normal form

• Atomic Values: Each cell in a table contains indivisible, atomic values.


Means a Relation should not contain any multivalued or composite
attributes.

• Unique Columns: Each column must have a distinct name to identify


the data it contains.

Primary Key: A table in 1NF should have a primary key that uniquely
identifies each record.

• Eliminating Duplicates: Duplicate rows are removed to prevent data


redundancy.

Prime attribute: - A attribute is said to be prime if it is part of any of the


candidate key

Non-Prime attribute: - A attribute is said to be non-prime if it is not part


of any of the candidate key

Eg R(ABCD)

AB→CD

Mere candidate key is AB so, A and B are prime attribute, C and D are
non-prime attributes.
PARTIAL DEPENDENCY- When a non - prime attribute is dependent
only on a part (Proper subset) of candidate key then it is called partial
dependency. (PRIME > NON-PRIME)

Full DEPENDENCY- When a non - prime attribute is dependent on the


entire candidate key then it is called Full dependency.

eg. R(ABCD)

AB → D

A→C

SECOND NORMAL FORM

• Relation R is in 2NF if,

•R should be in 1 NF.

• R should not contain any Partial dependency. (that is every non-prime


attribute should be fully dependent upon candidate key)

Q.R (IA, B, C) B→C


TRANSITIVE DEPENDENCY – A functional dependency from
non-Prime attribute to

non-Prime attribute is called transitive

E.g.- R(A, B, C, D) with A as a candidate key

A→B

B→C [transitive dependency]

C→D [transitive dependency]

THIRD NORMAL FORM

● Let R be the relational schema, it is said to be in 3 NF

● R should be in 2NF

● It must not contain any transitive dependency.

THIRD NORMAL FORM DIRECT DEFINATION

• A relational schema R is said to be 3 NF if every functional


dependency in R from a-->β, either a is super key or ẞ is the prime
attribute

R (A,B,C)

A→B

B→C
BCNF (BOYCE CODD NORMAL FORM)

• A relational schema R is said to be BCNF if every functional


dependency in R from

• αβ

•a must be a super key

R (A,B,C)

AB → C

C→B

Some important note points on Normalization:

• A Relation with two attributes is always in BCNF.

&A Relation schema R consist of only prime attributes then R is always


in 3NF, but may or may not be in BCNF.

Q. Consider the universal relational schema R (A, B, C, D, E, F, G, H,


I, J) and a set or following functional dependencies.

F = {ABC, A→DE, BF, F GH, DIJ} determine the keys for R ?

Decompose R into 2nd normal form.


Multivalued Dependency

• Denoted by, A →→ B, Means, for every value of A, there may exist


more than one value of B.

E.g. S_name → → Club_name

S_Name Club_Name
Kamesh Dance
Kamesh Guitar

A trivial multivalued dependency X→→Y is one where either Yis

a subset of X, or X and Y together form the whole set of attributes of the


relation.

eg. let the constraint specified by MVD in relation Student as


NOTE: The above Student schema is in BCNF as no functional
dependency holds on EMP, but still redundancy due to MVD.

Each row indicates that a given restaurant can

deliver a given variety. The table


has no non-key attributes
because its only key is
{Restaurant, Variety, Delivery
Area). Therefore, it meets all
normal forms up to BCNF.

If we assume, however, that


Variety offered by a restaurant
are not affected by delivery area
(i.e. a restaurant offers all
Variety it makes to all areas it supplies), then it does not meet 4NF. The
problem is that the table features two non-trivial multivalued
dependencies on the {Restaurant} attribute (which is not a super key).
The dependencies are:

• {Restaurant} →→ {Variety}

• {Restaurant} →→ {Delivery Area}

If we have two or more multivalued independent attributes in the same


relation schema, we get into a problem of having to repeat every value of
one of the attributes with every value of the other attribute to keep the
relation state consistent and to maintain the independence among the
attributes involved. This constraint is specified by a multivalued
dependency.
A relation is in 4NF iff

• It is in BCNF

• There must not exist any non-trivial multivalued dependency.

•Each MVD is decomposed in separate table, where it becomes trivial


MVD.

Lossy/Lossless-Dependency Preserving Decomposition

• Because of a normalization a table is Decomposed into two or more


tables, but during this decomposition we must ensure satisfaction of
some properties out of which the most important is lossless join property
/ decomposition.

If we decompose a table r into two tables r₁ and r₂ because of


normalization then at some later stage if we want to join(combine)
(natural join) these tables r₁ and r₂, then we must get back the original
table r, without any extra or less tuple. But some information may be lost
during retrieval of original relation or table. For e.g.
Decomposition is lossy if R1 ‫ כ‬R2 ᛞ R

• Decomposition is lossy if R ‫ כ‬R1 ᛞ R2

Decomposition is lossless if R1ᛞ R2 = R "The decomposition of relation


R into R₁ and R₂ is lossless when the join of R₁ and R₂ yield the same
relation as in R." which guarantees that the spurious (extra or less) tuple
generation problem does not occur with respect to the relation schemas
created after decomposition.

This property is extremely critical and must be achieved at any cost.

lossless Decomposition / NonAdditive Join Decomposition


How to check for lossless join decomposition using FD set,
following conditions must hold:

• Union of Attributes of R₁ and R₂ must be equal to attribute of R. Each


attribute of R must be either in R₁ or in R2. Att(R₁) U Att(R₂) = Att(R)

Intersection of Attributes of R₁ and R₂ must not be NULL. Att (R₁) ∩


Att(R2) ≠ Φ

Common attribute must be a key for at least one relation (R₁ or R₂) •
Att(R₁) ∩ Att(R₂) → (R₁) or Att(R₁) ∩ Att(R₂)→(R2)

Q R (A, B, C, D)

A→ B, BC, CD, D→A

R₁(A, B), R₂(B, C) AND R3(C, D)

QR(ABCDE)(NF) R₁(AB) R2(BC) R3(ABCD) R4(EG)

A→BC

C→DE

DE→E

5 NF/Project-Join Normal Form

• A Relational table R is said to be in 5th normal form if

• it is in 4 NF

• it cannot be further non-loss decomposed


Dependency Preserving Decomposition

Let relation R be decomposed into Relations R1, R2, R3............. R with


their respective functional Dependencies set as F1, F2, F3............. FN,
then the Decomposition is Dependency Preserving iff

{F₁ UF2 UF3 UF... UFN }+ = F+

• Dependency preservation property, although desirable, is sometimes


sacrificed.

X = PQRS
F = {QR → S, R → P, S→ Q}
Y = (PR) and Z = (QRS)

Indexing

Relational databases are based on set theory.

● In set theory, the order of elements is unimportant, similarly in


database tables.

● However, in practical implementation, element order in tables is


often specified.

● Various operations such as search, insertion, and deletion are


influenced by the order of elements in the tables.

● Elements in a table can be stored in two ways: sorted (ordered) or


unsorted (unordered).
File organization/organization of records in a Tile
Ordered file organization

•All the records in the file are ordered on some search key field.

•Here binary search is possible. (give example of book page searching)


•Maintenance (insertion & deletion) is costly, as it requires reorganization
of entire file.

•Notes that we will get binary search only if we are using that key for
searching on which indexing is done, otherwise it will behave as
unsorted file.

•if file is unordered then no of block assesses required to reach correct


block which contain the desired record is O(logn), where n is the number
of blocks.

Unordered file organization

Records are typically added at the end of the file, without following any
specific order.

This insertion method allows only linear search, resulting in slower


search times.
Despite slow searches, maintenance including insertion and deletion is
simpler.

No reorganization of the entire file is needed, making maintenance


easier.

If file is unordered then no of block assesses required to reach correct


block which contain the desired record is O(n), where n is the number of
blocks.

Indexes are supplementary structures in databases, aiding in swift


record retrieval.

They enable quick data access based on particular attributes identified


for indexing.

This technique is similar to the index sections seen in books.

Indexes provide secondary pathways to access records without


changing their physical position in the main file.
Q. Suppose we have ordered file with records stored r = 30,000 on a
disk with Block Size B = 1024 B. File records are of fixed size and are
unspanned with record length R = 100 B. Suppose that ordering key field
of file is 9 B long and a block pointer is 6 B long, Implement primary
indexing?

1. Indexes can be established on any relation field, be it primary key or


non-key.

2. Each attribute can have a dedicated index file, meaning multiple index
files may exist for one main file.

3. Index files are always organized, allowing for the utilization of binary
search advantages, irrespective of the main file's order.

4. Indexing accelerates data retrieval time but also introduces space


overhead for storing the index file.

5. The correct block in the main file can be located with log2(number of
blocks in index file) + 1 accesses.

Types of Indexing

In single-level indexing, an index file is created for the main file, marking
the end of the indexing process.

• Multiple-level indexing, on the other hand, involves creating an index


for the index file and continually repeating this procedure until only a
single block remains
PRIMARY INDEXING

Main file is always sorted according to primary key.

Indexing is done on Primary Key, therefore called as primary indexing

Index file have two columns, first primary key and second anchor pointer
(base address of block)

It is an example of Sparse Indexing.

• Here first record (anchor record) of every block gets an entry in the
index file

No. of entries in the index file = No of blocks acquired by the main file.

CLUSTERED INDEXING

Main file will be ordered on some non-key attributes

No of entries in the index file = no of unique values of the attribute on


which indexing is done.

It is the example of Sparse as well as dense indexing

SECONDARY INDEXING

•Most common scenarios, suppose that we already have a primary


indexing on primary key, but there is frequent query on some other
attributes, so we may decide to have one more index file with some
other attribute.
•Main file is ordered according to the attribute on which indexing is
done(unordered).

•Secondary indexing can be done on key or non-key attribute.

•No of entries in the index file is same as the number of entries in the
main file.

•It is an example of dense indexing.

Dense Vs Sparse

Dense Index In dense index, there is an entry in the index file for every
search key value in the main file. This makes searching faster but
requires more space to store index records itself. Note that it is not for
every record, it is for every search key value.

Sometime number of records in the main file > number of search keys in
the main file, for example if search key is repeated.

Sparse Index-If an index entry is created only for some records of the
main file, then it is called sparse index. No. of index entries in the index
file < No. of records in the main file. Note: - dense and sparse are not
complementary to each other, sometimes it is possible that a record is
both dense and sparse.
B tree

A B-tree of order m if non-empty is an m-way search tree in which.

The root has at least zero child nodes and at most m child nodes.

The internal nodes except the root have at least celling(m/2) child nodes
and at most m child nodes.

The number of keys in each internal node is one less than the number of
child nodes and these keys partition the subtrees of the nodes in a
manner similar to that of m-way search tree.

All leaf nodes are on the same level (perfectly balanced).

Q. Consider the following elements 5, 10, 12, 13, 14, 1, 2, 3, 4 insert


them into an empty b-tree of order = 3.
Query Language

• After designing a data base, that is ER diagram followed by conversion


in relational model followed by normalization and indexing, now next task
is how to store, retrieve and modify data in the data base.

So here we will be concentrating more on the retrieval part. Query


languages are used for this purpose. Query languages, data query
languages or database query languages (DQLs) are computer
languages using which user request some information from the
database. A well known example is the Structured Query Language
(SQL).

Procedural Query Language

• Here users instruct the system to performs a sequence of operations


on the data base in order to compute the desired result.

Means user provides both what data to be retrieved and how data to be
retrieved. e.g. Relational Algebra.
Non-Procedural Query Language

• In nonprocedural language, the user describes the desired information


without giving a specific procedure for obtaining that information.

What data to be retrieved e.g. Relational Calculus. Tuple relational


calculus, Domain relational calculus are declarative query languages
based on mathematical logic

Relational Algebra (Procedural) and Relational Calculus


(non-procedural) are mathematical system/query languages which are
used for query on relational model.

• RA and RC are not executed in any computer they provide the


fundamental mathematics on which SQL is based.

SQL (structured query language) works on RDBMS, and it includes


elements of both procedural or non-procedural query language.
RELATIONAL ALGEBRA

• RA like any other mathematical system provides a number of operators


and use relations (tables) as operands and produce a new relation as
their result.

Every operator in the RA accepts (one or two) relation/table as input


arguments and returns always a single relation instance as the result
without a name.

It also does not consider duplicity by default as it is based on set theory.


Same query is written in RA and SQL the result may be different as SQL
considers duplication.

As it is pure mathematics no use of English keywords. Operators are


represented using symbols.
BASIC/FUNDAMENTAL OPERATORS

• The fundamental operations in the relational algebra are select, project,


union, set difference, Cartesian product, and Rename.

The select, project, and rename operations are called unary operations,
because they operate on one relation.

• Union, Cartesian product and set difference operate on pairs of


relations and are, therefore, called binary operations.

• Relational algebra also provides the framework for query optimization.

Relational Schema - A relation schema R, denoted by R (A1, A2, ...,


An), is made up of a relation name R and a list of attributes, A1, A2, ...,
An. Each attribute A₁ is the name of a role played by some domain D in
the relation schema R. It is use to describe a Relation.

5.g. Schema representation of Table Student is as STUDENT (NAME,


ID, CITY, COUNTRY, HOBBY).

Relational Instance - Relations with its data at particular instant of time.


The Project Operation (Vertical Selection)

Main idea behind project operator is to select desired column

• The project operation is a unary operation that returns its argument


relation, with certain attributes left out.

•Projection is denoted by the uppercase Greek letter pi (П). М

• columns selected can be 1, Maximum selected Columns can ben.

• column_name (table_name)

Q.Write a RELATIONAL ALGEBRA query to find the name of all


customer without duplication having bank account?

Q Write a RELATIONAL ALGEBRA query to find all the details of bank


branches?
The Select Operation (Horizontal Selection)

• The select operation selects tuples that satisfy a given


predicate/Condition p.

• Lowercase Greek letter sigma (σ) is used to denote selection.

It is a unary operator.

• Eliminates only tuples/rows.

condition (table_name)

Q. Write a RELATIONAL ALGEBRA query to find all account_no where


balance is less the 1000?

Maccount_number( balance<1000 (account))

Q Write a RELATIONAL ALGEBRA query to find branch name which is


situated in Delhi and having assets less than 1,00,000?

branch_name( (branch_city = delhi) ^ (assets < 1000) (branch))


Commutative in Nature,
• We allow comparisons using =, =, <, >, ≤ and ≥ in the selection
predicate.

Using the connectives and (A), or (V), and not (-), we can combine
several predicates into a larger predicate.

Minimum number of tuples selected can be 0, Maximum selected tuples


can be all.

The Union Operation

• It is a binary operation, denoted, as in set theory, by U.

• Written as, Expression, U Expression2, r U s={t|tErortes}

For a union operation r U s to be valid, we require that two


conditio

• The relations r and s must be of the same arity. That is, they mu of
attributes.

• The domains of the ith attribute of r and the ith attribute of s must be
the same, for all i.
Q. Write a RELATIONAL ALGEBRA query to find all the customer name
who have a loan or an account or both?

customer_name(depositor)) U Πcustomer_name(borrower))

Write a RELATIONAL ALGEBRA query to find all the customer name


who have a loan but do not @ave an account?

customer_name(borrower)) - (Πcustomer_name(depositor)))

Some points to remember

• Deg (RUS) = Deg(R) = Deg(S)

• Max (IRI, ISI) <= IRUSI <= (IRI+ISI)

The Set-Difference Operation

• The set-difference operation, denoted by -, allows us to find tuples that


are in one relation but are not in another. It is a binary operator.

The expression r - s produces a relation containing those tuples in r but


not in s.

We must ensure that set differences are taken between compatible


relations.
For a set-difference operation r - s to be valid, we require that the
relations r and s be of the same arity, and that the domains of the ith
attribute of r and the ith attribute of s be the same, for all i. • 0 <= IR - SI
<= IRI

The Cartesian-Product Operation

• The Cartesian-product operation, denoted by a cross (x), allows us to


combine information from any two relations.

It is a binary operator; we write the Cartesian product of relations R₁ and


R₂ as R1 × R2.

Cartesian-product operation associates every tuple of R₁ with every tuple


of R2.

R₁ X R₂ = {rs | r E R₁ and s E R₂}, contains one tuple <r, s>


(concatenation of tuples r and s) for each pair of tuples r ∈ R₁, s ∈ R2.
Q. Write a RELATIONAL ALGEBRA query to find the name of all the
customers along with account balance, who have an account in the
bank?

Πcustomer_name, balance
(account.account_number=depositor.account_number (account X
depositor))

R₁ X R₂ returns a relational instance whose schema contains all the


fields of R₁ (in order as they appear in R₁) and all fields of R₂ (in order as
they appear in R₂).

• If R₁ has m tuples and R₂ has n tuples the result will be having = m*n
tuples.

Same attribute name may appear in both R₁ and R₂, we need to devise a
naming • schema to distinguish between these attributes.
Rename Operation

• The results of relational algebra are also relations but without any
name. This Query do not change the name of the table in the original
data base, but create a new copy of the table.

• The rename operation allows us to rename the output relation.

It is denoted with small Greek letter rho p. Where the result of


expression E is saved with name of x.

Px(Α1, Α2, Α3, Α4,...ΑΝ(Ε)

PLearner(Student)

PLearner(Stu_ID, User_Name, Age) (Student(Roll_No, Name, Age))

Additional/Derived Relational-Algebra Operations

• If we restrict ourselves to just the fundamental operations, certain


common queries are lengthy to express. Therefore, we use additional
operations.

These additional operations do not add any power to the algebra.

They are used to simplify the queries.

Q. Write a RELATIONAL ALGEBRA query to find all the customer name

who have both a loan and an account?

customer_name(depositor)) ∩ customer_name(borrower))
The Natural-Join Operation*⛝

• The natural join is a binary operation that allows us to combine certain


selections and a Cartesian product into one operation.

he natural join is a Lossy operator.

Q. Write a RELATIONAL ALGEBRA query to find the name of all the


customers along with account balance, who have an account in the
bank?

Пcustomer_name, balance
(account.account_number=depositor.account_number (account ⛝
depositor))

In general, the DIVISION operation is applied when we have query like


student who have completed both database1 and data base2 tasks.
Introduction to SQL

Structured Query Language is a domain-specific language (not general


purpose) used in programming and design for managing data held in a
relational database management system (RDBMS).

Although we refer to the SQL language as a "query language," it can do


much more than just query a database. It can define the structure of the
data base, modify data in the database, specify security constraints and
number of other tasks.

Originally based upon relational algebra (procedural) and tuple relational


calculus (Non- procedural) mathematical model.
Overview of the SQL Query Language

1. IBM developed the original version of SQL, originally called Sequel


(Structured English Query Language), as part of the System R project in
the early 1970s.

2. The Sequel language has evolved since then, and its name has
changed to SQL (Structured Query Language) (some other company
has trademark on the word sequel). SQL has clearly established itself as
the standard relational database language.

3. In 1986, the American National Standards Institute (ANSI) and the


International Organization for Standardization (ISO) published an SQL
standard, called SQL-86.

4. The next version of the standard was SQL-89, SQL-92, SQL:1999,


SQL:2003, SQL:2006, SQL:2008, SQL:2011, SQL: 2016, SQL: 2019and
most recently SQL:2023.

Classification of database languages

1. Data Definition Language (DDL):

1. a. DDL is set of SQL commands used to create, modify and delete


database structures but not data.

2. b. They are used by the DBA to a limited extent, a database designer,


or application developer.

3. c. Create, drop, alter, truncate are commonly used DDL command.


CREATE, ALTER, DROP, TRUNCATE, COMMENT, GRANT, REVOKE
statement
Data Manipulation Language (DML):

1. a. A DML is a language that enables users to access or manipulates


data as organized by the appropriate data model.

2. b. There are two types of DMLs :

1. i. Procedural DMLS: It requires a user to specify what data are


needed and how to get those data.

2. ii. Declarative DMLs (Non-procedural DMLs): It requires a user to


specify what data are needed without specifying how to get those data.

3. c. Insert, update, delete, query are commonly used DML commands.


INSERT, UPDATE, DELETE statement

3. Data Control Language (DCL) :

1. a. It is the component of SQL statement that control access to data


and to the database.

2. b. Commit, rollback command are used in DCL. GRANT and


REVOKE statement

4. Data Query Language (DQL):

1. a. It is the component of SQL statement that allows getting data from


the database and imposing ordering upon it.

2. b. It includes select statement. SELECT statement

View Definition Language (VDL):

1. 1. VDL is used to specify user views and their mapping to conceptual


schema.
2. 2. It defines the subset of records available to classes of users.

3. 3. It creates virtual tables and the view appears to users like


conceptual level.

4. 4. It specifies user interfaces. SQL is a DML language.

CREATE TABLE table_name (

column1 data_type [constraints],

column2 data_type [constraints],

column3 data_type [constraints],


);

CREATE TABLE Students (

StudentID INT PRIMARY KEY,

FirstName VARCHAR(50),

LastName VARCHAR(50),
Age INT,
Email VARCHAR(100)
);
list of some common data types supported by SQL along with a brief
description of each:

Numeric Data Types:

1. `INT`: For storing integer values.

2. `SMALLINT': A smaller range of integers compared to INT.

3. `BIGINT': For storing larger integers.

4. `DECIMAL(p, s)`: For storing exact numerical values, where `p` is the
precision and 's' is the scale.

5. 'FLOAT: For storing floating-point numbers.

6. `REAL`: A data type that can store floating-point numbers, generally


with less precision compared to FLOAT.

String Data Types:

7. `VARCHAR(n)`: Variable-length character string, where `n` is the


maximum length.

8. 'CHAR(n)': Fixed-length character string, where `n` is the length.

9. 'TEXT': For storing long text strings.

You might also like