0% found this document useful (0 votes)
2 views

DBMS Unit-1

The document outlines the fundamentals of Database Management Systems (DBMS), including the structure, components, and applications of databases. It discusses the advantages of DBMS over traditional file systems, such as reduced redundancy and improved data integrity, while also covering data models, query processing, and transaction management. Additionally, it highlights the importance of the Entity-Relationship (ER) model in database design and the evolution of database systems over time.

Uploaded by

jacksharma189
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DBMS Unit-1

The document outlines the fundamentals of Database Management Systems (DBMS), including the structure, components, and applications of databases. It discusses the advantages of DBMS over traditional file systems, such as reduced redundancy and improved data integrity, while also covering data models, query processing, and transaction management. Additionally, it highlights the importance of the Entity-Relationship (ER) model in database design and the evolution of database systems over time.

Uploaded by

jacksharma189
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 75

DEPARTMENT OF COMPUTER SCIENCE

&
ENGINEERING

UNIT-I
Course Name: Database Management System

Submitted By: Dr. Pankaj Jain


Outline
• Introduction
• Database vs file system
• View of data
• Data Models
• Database language
• Database Users and Administrators
• Transaction Management
• Components of DBMS
• ER Model
– Basic
– Constraints, keys, Design issues
– ER diagram
Database Management System
• DBMS contains information about a particular enterprise
– Collection of interrelated data
– Set of programs to access the data
– An environment that is both convenient and efficient to use
• Database Applications:
– Banking: transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Online retailers: order tracking, customized recommendations
– Manufacturing: production, inventory, orders, supply chain
– Human resources: employee records, salaries, tax deductions
• Databases can be very large.
• Databases touch all aspects of our lives
University Database Example
• Application program examples
– Add new students, instructors, and courses
– Register students for courses, and generate class
rosters
– Assign grades to students, compute grade point
averages (GPA) and generate transcripts
• In the early days, database applications
were built directly on top of file systems
Drawbacks of using file systems to store data

• Data redundancy and inconsistency


– Multiple file formats, duplication of information in different files
• Difficulty in accessing data
– Need to write a new program to carry out each new task
• Data isolation
– Multiple files and formats
• Integrity problems
– Integrity constraints (e.g., account balance > 0) become
“buried” in program code rather than being stated explicitly
– Hard to add new constraints or change existing ones
Drawbacks of using file systems to store data (Cont.)
• Atomicity of updates
– Failures may leave database in an inconsistent state with partial updates carried
out
– Example: Transfer of funds from one account to another should either complete
or not happen at all
• Concurrent access by multiple users
– Concurrent access needed for performance
– Uncontrolled concurrent accesses can lead to inconsistencies
• Example: Two people reading a balance (say 100) and updating it by withdrawing
money (say 50 each) at the same time
• Security problems
– Hard to provide user access to some, but not all, data

Database systems offer solutions to all the above


problems
Levels of Abstraction
• Physical level: describes how a record (e.g., instructor) is stored.
• Logical level: describes data stored in database, and the relationships
among the data.
type instructor = record
ID : string;
name : string;
dept_name : string;
salary : integer;
end;
• View level: application programs hide details of data types. Views can
also hide information (such as an employee’s salary) for security
purposes.
View of Data
An architecture for a database system
Instances and Schemas
• Similar to types and variables in programming languages
• Logical Schema – the overall logical structure of the database
– Example: The database consists of information about a set of customers and
accounts in a bank and the relationship between them
• Analogous to type information of a variable in a program
• Physical schema–
schema the overall physical structure of the database
• Instance – the actual content of the database at a particular point in time
– Analogous to the value of a variable
• Physical Data Independence – the ability to modify the physical schema
without changing the logical schema
– Applications depend on the logical schema
– In general, the interfaces between the various levels and components should
be well defined so that changes in some parts do not seriously influence
others.
Data Models
• A collection of tools for describing
– Data
– Data relationships
– Data semantics
– Data constraints
• Relational model
• Entity-Relationship data model (mainly for database
design)
• Object-based data models (Object-oriented and Object-
relational)
• Semistructured data model (XML)
• Other older models:
– Network model
– Hierarchical model
Relational Model
• All the data is stored in various tables.
• Example of tabular data in the relational model
Columns

Rows
A Sample Relational Database
Data Definition Language (DDL)
• Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))

• DDL compiler generates a set of table templates stored in a data


dictionary
• Data dictionary contains metadata (i.e., data about data)
– Database schema
– Integrity constraints
• Primary key (ID uniquely identifies instructors)
– Authorization
• Who can access what
Data Manipulation Language (DML)
• Language for accessing and manipulating the data organized
by the appropriate data model
– DML also known as query language
• Two classes of languages
– Pure – used for proving properties about computational power
and for optimization
• Relational Algebra
• Tuple relational calculus
• Domain relational calculus
– Commercial – used in commercial systems
• SQL is the most widely used commercial language
SQL

• The most widely used commercial language


• SQL is NOT a Turing machine equivalent language
• SQL is NOT a Turing machine equivalent language
• To be able to compute complex functions SQL is usually
embedded in some higher-level language
• Application programs generally access databases through one of
– Language extensions to allow embedded SQL
– Application program interface (e.g., ODBC/JDBC) which allow SQL
queries to be sent to a database
Database Design
The process of designing the general structure of the database:

• Logical Design – Deciding on the database schema. Database


design requires that we find a “good” collection of relation
schemas.
– Business decision – What attributes should we record in the database?
– Computer Science decision – What relation schemas should we have
and how should the attributes be distributed among the various
relation schemas?
• Physical Design – Deciding on the physical layout of the database
Database Design (Cont.)
• Is there any problem with this relation?
Design Approaches
• Need to come up with a methodology to
ensure that each of the relations in the
database is “good”
• Two ways of doing so:
– Entity Relationship Model (Chapter 7)
• Models an enterprise as a collection of entities and
relationships
• Represented diagrammatically by an entity-
relationship diagram:
– Normalization Theory (Chapter 8)
• Formalize what designs are bad, and test for them
Object-Relational Data Models
• Relational model: flat, “atomic” values
• Object Relational Data Models
– Extend the relational data model by including object orientation
and constructs to deal with added data types.
– Allow attributes of tuples to have complex types, including non-
atomic values such as nested relations.
– Preserve relational foundations, in particular the declarative
access to data, while extending modeling power.
– Provide upward compatibility with existing relational languages.
XML: Extensible Markup Language
• Defined by the WWW Consortium (W3C)
• Originally intended as a document markup language
not a database language
• The ability to specify new tags, and to create nested tag
structures made XML a great way to exchange data,
not just documents
• XML has become the basis for all new generation data
interchange formats.
• A wide variety of tools is available for parsing,
browsing and querying XML documents/data
Database Engine
• Storage manager
• Query processing
• Transaction manager
Storage Management
• Storage manager is a program module that provides the
interface between the low-level data stored in the database and
the application programs and queries submitted to the system.
• The storage manager is responsible to the following tasks:
– Interaction with the OS file manager
– Efficient storing, retrieving and updating of data
• Issues:
– Storage access
– File organization
– Indexing and hashing
Query Processing

1. Parsing and translation


2. Optimization
3. Evaluation
Query Processing (Cont.)
• Alternative ways of evaluating a given query
– Equivalent expressions
– Different algorithms for each operation
• Cost difference between a good and a bad way of
evaluating a query can be enormous
• Need to estimate the cost of operations
– Depends critically on statistical information about relations
which the database must maintain
– Need to estimate statistics for intermediate results to compute
cost of complex expressions
Transaction Management
• What if the system fails?
• What if more than one user is concurrently updating
the same data?
• A transaction is a collection of operations that
performs a single logical function in a database
application
• Transaction-management component ensures that
the database remains in a consistent (correct) state
despite system failures (e.g., power failures and
operating system crashes) and transaction failures.
• Concurrency-control manager controls the
interaction among the concurrent transactions, to
ensure the consistency of the database.
Database Users and Administrators

Database
Database System Internals
Database Architecture

The architecture of a database systems is greatly influenced by


the underlying computer system on which the database is
running:
• Centralized
• Client-server
• Parallel (multi-processor)
• Distributed
History of Database Systems
• 1950s and early 1960s:
– Data processing using magnetic tapes for storage
• Tapes provided only sequential access
– Punched cards for input
• Late 1960s and 1970s:
– Hard disks allowed direct access to data
– Network and hierarchical data models in widespread use
– Ted Codd defines the relational data model
• Would win the ACM Turing Award for this work
• IBM Research begins System R prototype
• UC Berkeley begins Ingres prototype
– High-performance (for the era) transaction processing
History (cont.)
• 1980s:
– Research relational prototypes evolve into commercial systems
• SQL becomes industrial standard
– Parallel and distributed database systems
– Object-oriented database systems
• 1990s:
– Large decision support and data-mining applications
– Large multi-terabyte data warehouses
– Emergence of Web commerce
• Early 2000s:
– XML and XQuery standards
– Automated database administration
• Later 2000s:
– Giant data storage systems
• Google BigTable, Yahoo PNuts, Amazon, ..
ER model -- Database Modeling
• The ER data mode was developed to facilitate database design by
allowing specification of an enterprise schema that represents the
overall logical structure of a database.
• The ER model is very useful in mapping the meanings and
interactions of real-world enterprises onto a conceptual schema.
Because of this usefulness, many database-design tools draw on
concepts from the ER model.
• The ER data model employs three basic concepts:
– entity sets,
– relationship sets,
– attributes.
• The ER model also has an associated diagrammatic representation,
the ER diagram, which can express the overall logical structure of
a database graphically.
Entity Sets
• An entity is an object that exists and is distinguishable from
other objects.
– Example: specific person, company, event, plant
• An entity set is a set of entities of the same type that share
the same properties.
– Example: set of all persons, companies, trees, holidays
• An entity is represented by a set of attributes; i.e.,
descriptive properties possessed by all members of an entity
set.
– Example:
instructor = (ID, name, street, city, salary )
course= (course_id, title, credits)
• A subset of the attributes form a primary key of the entity
set; i.e., uniquely identifiying each member of the set.
Entity Sets -- instructor and
student
instructor_ID instructor_name student-ID student_name
Relationship Sets
• A relationship is an association among several entities
Example:
44553 (Peltier) advisor 22222 (Einstein)
student entity relationship set instructor entity
• A relationship set is a mathematical relation among n  2
entities, each taken from entity sets
{(e1, e2, … en) | e1  E1, e2  E2, …, en  En}

where (e1, e2, …, en) is a relationship


– Example:

(44553,22222)  advisor
Relationship Set advisor
Relationship Sets (Cont.)
• An attribute can also be associated with a relationship set.
• For instance, the advisor relationship set between entity sets instructor
and student may have the attribute date which tracks when the student
started being associated with the advisor
Degree of a Relationship Set
• binary relationship
– involve two entity sets (or degree two).
– most relationship sets in a database system are binary.
• Relationships between more than two entity sets are
rare. Most relationships are binary. (More on this
later.)
 Example: students work on research projects under the
guidance of an instructor.
 relationship proj_guide is a ternary relationship between
instructor, student, and project
Mapping Cardinality Constraints
• Express the number of entities to which another entity
can be associated via a relationship set.
• Most useful in describing binary relationship sets.
• For a binary relationship set the mapping cardinality
must be one of the following types:
– One to one
– One to many
– Many to one
– Many to many
Mapping Cardinalities

One to one One to many

Note: Some elements in A and B may not be mapped to any


elements in the other set
Mapping Cardinalities

Many to one Many to many

Note: Some elements in A and B may not be mapped to any


elements in the other set
Complex Attributes
• Attribute types:
– Simple and composite attributes.
– Single-valued and multivalued attributes
• Example: multivalued attribute: phone_numbers
– Derived attributes
• Can be computed from other attributes
• Example: age, given date_of_birth
• Domain – the set of permitted values for
each attribute
Composite Attributes
Redundant Attributes
• Suppose we have entity sets:
– instructor, with attributes: ID, name, dept_name, salary
– department, with attributes: dept_name, building, budget
• We model the fact that each instructor has an associated
department using a relationship set inst_dept
• The attribute dept_name appears in both entity sets. Since
it is the primary key for the entity set department, it
replicates information present in the relationship and is
therefore redundant in the entity set instructor and needs to
be removed.
• BUT: when converting back to tables, in some cases the
attribute gets reintroduced, as we will see later.
Weak Entity Sets
• Consider a section entity, which is uniquely identified by a
course_id, semester, year, and sec_id.
• Clearly, section entities are related to course entities. Suppose we
create a relationship set sec_course between entity sets section and
course.
• Note that the information in sec_course is redundant, since section
already has an attribute course_id, which identifies the course with
which the section is related.
• One option to deal with this redundancy is to get rid of the
relationship sec_course; however, by doing so the relationship
between section and course becomes implicit in an attribute, which
is not desirable.
Weak Entity Sets (Cont.)
• An alternative way to deal with this redundancy is to not store the
attribute course_id in the section entity and to only store the remaining
attributes section_id, year, and semester. However, the entity set
section then does not have enough attributes to identify a particular
section entity uniquely; although each section entity is distinct, sections
for different courses may share the same section_id, year, and
semester.
• To deal with this problem, we treat the relationship sec_course as a
special relationship that provides extra information, in this case, the
course_id, required to identify section entities uniquely.
• The notion of weak entity set formalizes the above intuition. A weak
entity set is one whose existence is dependent on another entity, called
its identifying entity; instead of associating a primary key with a weak
entity, we use the identifying entity, along with extra attributes called
discriminator to uniquely identify a weak entity. An entity set that is
not a weak entity set is termed a strong entity set.
Weak Entity Sets (Cont.)
• Every weak entity must be associated with an
identifying entity; that is, the weak entity set is
said to be existence dependent on the identifying
entity set. The identifying entity set is said to own
the weak entity set that it identifies. The
relationship associating the weak entity set with
the identifying entity set is called the identifying
relationship.
• Note that the relational schema we eventually
create from the entity set section does have the
attribute course_id, for reasons that will become
clear later, even though we have dropped the
attribute course_id from the entity set section.
E-R Diagrams
Entity Sets
 Entities can be represented graphically as follows:
• Rectangles represent entity sets.
• Attributes listed inside entity rectangle
• Underline indicates primary key attributes
Relationship Sets

 Diamonds represent relationship sets.


Relationship Sets with Attributes
Roles
• Entity sets of a relationship need not be distinct
– Each occurrence of an entity set plays a “role” in the relationship
• The labels “course_id” and “prereq_id” are called roles.
Cardinality Constraints
• We express cardinality constraints by drawing either a directed line
(), signifying “one,” or an undirected line (—), signifying
“many,” between the relationship set and the entity set.

• One-to-one relationship between an instructor and a student :


– A student is associated with at most one instructor via the relationship
advisor
– A student is associated with at most one department via stud_dept
One-to-Many Relationship
• one-to-many relationship between an instructor and a
student
– an instructor is associated with several (including 0) students via
advisor
– a student is associated with at most one instructor via advisor,
Many-to-One Relationships

• In a many-to-one relationship between an instructor and


a student,
– an instructor is associated with at most one student via advisor,
– and a student is associated with several (including 0) instructors
via advisor
Many-to-Many Relationship
• An instructor is associated with several (possibly 0)
students via advisor
• A student is associated with several (possibly 0)
instructors via advisor
Total and Partial Participation

 Total participation (indicated by double line): every entity in the entity set participates in at least one relationship in the relationship set

participation of student in advisor relation is total


 every student must have an associated instructor
 Partial participation: some entities may not participate in any relationship in the relationship set
 Example: participation of instructor in advisor is partial
Notation for Expressing More Complex Constraints

 A line may have an associated minimum and maximum cardinality,


shown in the form l..h, where l is the minimum and h the maximum
cardinality
 A minimum value of 1 indicates total participation.
 A maximum value of 1 indicates that the entity participates in
at most one relationship
 A maximum value of * indicates no limit.

Instructor can advise 0 or more students. A student must have


1 advisor; cannot have multiple advisors
Notation to Express Entity with Complex Attributes
Expressing Weak Entity Sets

• In E-R diagrams, a weak entity set is depicted via a double


rectangle.
• We underline the discriminator of a weak entity set with a dashed
line.
• The relationship set connecting the weak entity set to the
identifying strong entity set is depicted by a double diamond.
• Primary key for section – (course_id, sec_id, semester, year)
E-R Diagram for a University Enterprise
Reduction to Relation Schemas
Reduction to Relation Schemas
• Entity sets and relationship sets can be expressed
uniformly as relation schemas that represent the contents
of the database.
• A database which conforms to an E-R diagram can be
represented by a collection of schemas.
• For each entity set and relationship set there is a unique
schema that is assigned the name of the corresponding
entity set or relationship set.
• Each schema has a number of columns (generally
corresponding to attributes), which have unique names.
Representing Entity Sets
• A strong entity set reduces to a schema with the same attributes
student(ID, name, tot_cred)

• A weak entity set becomes a table that includes a column for


the primary key of the identifying strong entity set
section ( course_id, sec_id, sem, year )
Representing Relationship Sets
• A many-to-many relationship set is represented as a schema
with attributes for the primary keys of the two participating
entity sets, and any descriptive attributes of the relationship set.
• Example: schema for relationship set advisor
advisor = (s_id, i_id)
Representation of Entity Sets with Composite Attributes

• Composite attributes are flattened out by creating a


separate attribute for each component attribute
– Example: given entity set instructor with composite
attribute name with component attributes first_name
and last_name the schema corresponding to the entity
set has two attributes name_first_name and
name_last_name
• Prefix omitted if there is no ambiguity (name_first_name
could be first_name)
• Ignoring multivalued attributes, extended instructor
schema is
– instructor(ID,
first_name, middle_initial, last_name,
street_number, street_name,
apt_number, city, state, zip_code,
date_of_birth)
Representation of Entity Sets with Multivalued Attributes

• A multivalued attribute M of an entity E is represented


by a separate schema EM
• Schema EM has attributes corresponding to the primary
key of E and an attribute corresponding to multivalued
attribute M
• Example: Multivalued attribute phone_number of
instructor is represented by a schema:
inst_phone= ( ID, phone_number)
• Each value of the multivalued attribute maps to a
separate tuple of the relation on schema EM
– For example, an instructor entity with primary key 22222 and
phone numbers 456-7890 and 123-4567 maps to two tuples:
(22222, 456-7890) and (22222, 123-4567)

Redundancy of Schemas
Many-to-one and one-to-many relationship sets that are total on the
many-side can be represented by adding an extra attribute to the
“many” side, containing the primary key of the “one” side
 Example: Instead of creating a schema for relationship set inst_dept,
add an attribute dept_name to the schema arising from entity set
instructor
Redundancy of Schemas (Cont.)
• For one-to-one relationship sets, either side
can be chosen to act as the “many” side
– That is, an extra attribute can be added to
either of the tables corresponding to the two
entity sets
• If participation is partial on the “many”
side, replacing a schema by an extra
attribute in the schema corresponding to
the “many” side could result in null values
Redundancy of Schemas (Cont.)
• The schema corresponding to a relationship set linking
a weak entity set to its identifying strong entity set is
redundant.

• Example: The section schema already contains the


attributes that would appear in the sec_course schema
Binary Vs. Non-Binary Relationships

• Although it is possible to replace any non-binary (n-ary,


for n > 2) relationship set by a number of distinct binary
relationship sets, a n-ary relationship set shows more
clearly that several entities participate in a single
relationship.
• Some relationships that appear to be non-binary may be
better represented using binary relationships
– For example, a ternary relationship parents, relating a child to
his/her father and mother, is best replaced by two binary
relationships, father and mother
• Using two binary relationships allows partial information (e.g.,
only mother being known)
– But there are some relationships that are naturally non-binary
• Example: proj_guide
Converting Non-Binary Relationships to Binary Form

• In general, any non-binary relationship can be represented using


binary relationships by creating an artificial entity set.
– Replace R between entity sets A, B and C by an entity set E, and three
relationship sets:
1. RA, relating E and A 2. RB, relating E and B
3. RC, relating E and C
– Create an identifying attribute for E and add any attributes of R to E
– For each relationship (ai , bi , ci) in R, create
1. a new entity ei in the entity set E 2. add (ei , ai ) to RA
3. add (ei , bi ) to RB 4. add (ei , ci ) to RC
Converting Non-Binary Relationships (Cont.)

• Also need to translate constraints


– Translating all constraints may not be possible
– There may be instances in the translated schema that
cannot correspond to any instance of R
• Exercise: add constraints to the relationships R A, RB and RC to
ensure that a newly created entity corresponds to exactly one
entity in each of entity sets A, B and C
– We can avoid creating an identifying attribute by making
E a weak entity set (described shortly) identified by the
three relationship sets
E-R Design Decisions
• The use of an attribute or entity set to represent an object.
• Whether a real-world concept is best expressed by an
entity set or a relationship set.
• The use of a ternary relationship versus a pair of binary
relationships.
• The use of a strong or weak entity set.
• The use of specialization/generalization – contributes to
modularity in the design.
• The use of aggregation – can treat the aggregate entity
set as a single unit without concern for the details of its
internal structure.
Summary of Symbols Used in E-R Notation
Symbols Used in E-R Notation (Cont.)

You might also like