0% found this document useful (0 votes)
58 views

Chapter1 Intro

The document provides an introduction to database management systems. It discusses that a DBMS is software designed to store and manage large, integrated collections of data that model real-world entities and relationships. Using a DBMS provides advantages over file-based data storage like efficient concurrent access, data independence from physical storage structures, security, and recovery from crashes. The document outlines some key aspects of DBMS including data models, schemas to describe stored data, transaction management to allow concurrent access without inconsistencies, and different levels of abstraction between conceptual/logical and physical views of data.

Uploaded by

Dang Phu Quy
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Chapter1 Intro

The document provides an introduction to database management systems. It discusses that a DBMS is software designed to store and manage large, integrated collections of data that model real-world entities and relationships. Using a DBMS provides advantages over file-based data storage like efficient concurrent access, data independence from physical storage structures, security, and recovery from crashes. The document outlines some key aspects of DBMS including data models, schemas to describe stored data, transaction management to allow concurrent access without inconsistencies, and different levels of abstraction between conceptual/logical and physical views of data.

Uploaded by

Dang Phu Quy
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

Database Management Systems

Chapter 1

INTRODUCTION TO
DATABASE MANAGEMENT
SYSTEM

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke


Chapter Outline
 Overview
 A historical perspective
 Files vs. DBMS
 Advantages of a DBMS
 Describing and storing data in a DBMS
 Transaction Management
 Structure of a DBMS

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke


1. Overview

What Is a DBMS?
 A very large, integrated collection of data.
 Models real-world enterprise.
 Entities (e.g., students, courses)
 Relationships (e.g., Madonna is taking CS564)
 A Database Management System (DBMS) is a
software package designed to store and
manage databases.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke


2. A HISTORICAL PERSPECTIVE
 The first general-purpose DBMS was designed by
Charles Bachman at General Electric in the early
1960s and was called the Integrated Data Store. It
formed the basis for the network data model.
(Bachman was the first recipient of ACM's Turing
Award - 1973).
 In the late 1960s, IBM developed the Information
Management System (IMS). IMS formed the basis for
the hierarchical data model.
 In1970, Edgar Codd, at IBM's San Jose Research
Laboratory, proposed a new data representation
framework called the relational data model. (Codd
won the 1981 Turing Award for his seminal work)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
2. A HISTORICAL PERSPECTIVE
 In the 1980s, the relational model consolidated its
position as the dominant DBMS paradigm, and
database systems continued to gain wide spread use.
 In the late 1980s and the 1990s, advances have been
made in many areas of database systems. Specialized
systems have been developed for creating data
warehouses, consolidating data from several
databases, and for carrying out specialized analysis.
 An interesting phenomenon is the emergence of
several enterprise resource planning (ERP) and
management resource planning(MRP) packages,
which add a substantial layer of application-oriented
features on top of a DBMS.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
3. Files vs. DBMS
 To understand the need for a DBMS, let us consider a
motivating scenario: A company has a large collection
(say, 500GB) of data on employees, departments,
products, sales, and so on. This data is accessed
concurrently by several employees. Questions about the
data must be answered quickly, changes made to the
data by different users must be applied consistently, and
access to certain parts of the data (e.g., salaries) must be
restricted.
 We can try to deal with this data management problem
by storing the data in a collection of operating system
files. This approach has many drawbacks, including the
following:
- We probably do not have 500GB of main memory to hold
all the data.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
3. Files vs. DBMS
- Even if we have 500GB of main memory, on computer
systems with 32-bit addressing, we can not refer directly
to more than about 4GB of data!
- We have to write special programs to answer each
question that users may want to ask about the data. Which
is complex because of the large volume of data to be
searched.
- We must protect the data from inconsistent changes made
by different users accessing the data concurrently.
- We must ensure that data is restored to a consistent state if
the system crashes while changes are being made.
- Operating systems provide only a password mechanism
for security. This is not sufficiently flexible to enforce
security policies in which different users have permission
to access different subsets of the data.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
3. Files vs. DBMS
 Application must stage large datasets between main
memory and secondary storage (e.g., buffering, page-
oriented access, 32-bit addressing, etc.)
 Special code for different queries
 Must protect data from inconsistency due to multiple
concurrent users
 Crash recovery
 Security and access control

A DBMS is a piece of software that is designed to make


the preceding tasks easier. By storing data in a DBMS,
rather than as a collection of operating system files, we
can use the DBMS's features to manage the data in a
robust and efficient manner.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
4. Advantages of a DBMS
Why Use a DBMS?
Using a DBMS to manage data has many advantages:

 Data independence and efficient access.


 Reduced application development time.
 Data integrity and security.
 Uniform data administration.
 Concurrent access, recovery from crashes.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke


?
Why Study Databases??

 Shift from computation to information


 at the “low end”: scramble to web space (a mess!)
 at the “high end”: scientific applications
 Datasets increasing in diversity and volume.
 Digital libraries, interactive video, Human Genome
project, EOS (Earth Observation System) project
 ... need for DBMS exploding
 DBMS encompasses most of CS
 OS, languages, theory, AI, multimedia, logic

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


5. Describing and storing data in a DBMS
Data Models
 A data model is a collection of concepts for
describing data.
 A schema is a description of a particular
collection of data, using the a given data
model.
 The relational model of data is the most widely
used model today.
 Main concept: relation, basically a table with rows
and columns.
 Every relation has a schema, which describes the
columns, or fields.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


Levels of Abstraction
 Many views, single View 1 View 2 View 3
conceptual (logical) schema
and physical schema. Conceptual Schema
 Views describe how users
see the data. Physical Schema

 Conceptual schema defines


logical structure
 Physical schema describes
the files and indexes used.
 Schemas are defined using DDL; data is modified/queried using DML.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


Example: University Database
 Conceptual schema:
 Students(sid: string, name: string, login: string,
age: integer, gpa:real)
 Courses(cid: string, cname:string, credits:integer)
 Enrolled(sid:string, cid:string, grade:string)
 Physical schema:
 Relations stored as unordered files.
 Index on first column of Students.
 External Schema (View):
 Course_info(cid:string, enrollment:integer)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Data Independence *
 Application programs insulated from how data is
structured and stored.
 Logical data independence: If the underlying data is
reorganized, that is, the conceptual schema is changed,
the definition of a view relation can be modified so that
the same relation is computed as before. Thus users can
be shielded from changes in logical structure of data.
Example: A relation Student(sid, sname,gpa) is replaced by
StudentName(sid, sname) and Studentgpas(sid, gpa) for
some reason, application programs that operate on the
Student relation can be shielded from this change by
defining a view Student(sid, sname, gpa).

 One of the most important benefits of using a DBMS!


Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Data Independence *
 Physical data independence: the conceptual schema
insulates users from changes in the physical storage of
the data. We can change these storage details without
altering applications. (Of course, performance might be
affected by such changes.)
Example: we could choose to store Student tuples in a heap
file with an index on the sname field or we can choose to
store it with an index on the gpa field. These storage are
not visible to users; except in terms of improved
performance, since they simply see a relation as a set of
tuples

 One of the most important benefits of using a DBMS!


Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
6. TRANSACTION MANAGEMENT
Concurrency Control
 Concurrent execution of user programs
is essential for good DBMS performance.
 Because disk accesses are frequent, and relatively
slow, it is important to keep the CPU humming by
working on several user programs concurrently.
 Interleaving actions of different user programs
can lead to inconsistency: e.g., check is cleared
while account balance is being computed.
 DBMS ensures such problems don’t arise: users
can pretend they are using a single-user system.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


Transaction: An Execution of a DB Program
 Key concept is transaction, which is an atomic
sequence of database actions (reads/writes).
 Each transaction, executed completely, must
leave the DB in a consistent state if DB is
consistent when the transaction begins.
 Users can specify some simple integrity constraints on
the data, and the DBMS will enforce these constraints.
 Beyond this, the DBMS does not really understand the
semantics of the data. (e.g., it does not understand
how the interest on a bank account is computed).
 Thus, ensuring that a transaction (run alone) preserves
consistency is ultimately the user’s responsibility!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
Scheduling Concurrent Transactions
 DBMS ensures that execution of {T1, ... , Tn} is
equivalent to some serial execution T1 ... Tn.
 Before reading/writing an object, a transaction requests
a lock on the object, and waits till the DBMS gives it the
lock. All locks are released at the end of the transaction.
(locking protocol: shared locks and exclusive lock.)
 Idea: If an action of Ti (say, writing X) affects Tj (which
perhaps reads X), one of them, say Ti, will obtain the
lock on X first and Tj is forced to wait until Ti completes;
this effectively orders the transactions.
 What if Tj already has a lock on Y and Ti later requests a
lock on Y? (Deadlock!) Ti or Tj is aborted and restarted!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


Ensuring Atomicity
 DBMS ensures atomicity (all-or-nothing property)
even if system crashes in the middle of a Xact.
 Idea: Keep a log (history) of all actions carried out
by the DBMS while executing a set of Xacts:
 Before a change is made to the database, the
corresponding log entry is forced to a safe location.
(WAL (Write-Ahead Log) protocol; OS support for this is
often inadequate.)
 After a crash, the effects of partially executed
transactions are undone using the log. (Thanks to WAL, if
log entry wasn’t saved before the crash, corresponding
change was not applied to database!)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1


Points to Note
 In summary, there are three points to remember with
respect to DBMS support for concurrency control and
recovery:
 Every object that is read or written by a transaction is first
locked in shared or exclusive mode, respectively. Placing a
lock on an object restricts its availability to other transactions
and thereby affects performance.
 For efficient log maintenance, the DBMS must be able to
selectively force a collection of pages in main memory to disk.
Operating system support for this operation is not always
satisfactory.
 Periodic check pointing can reduce the time needed to recover
from a crash. Of course, this must be balanced against the fact
that check pointing too often slows down normal execution

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2


Databases make these folks happy ...

 End users and DBMS vendors


 DB application programmers
 E.g., smart webmasters
 Database administrator (DBA)
 Designs logical /physical schemas
 Handles security and authorization
 Data availability, crash recovery
 Database tuning as needs evolve
Must understand how a DBMS works!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
These layers
7. Structure of a DBMS must consider
concurrency
control and
recovery
 A typical DBMS has a Query Optimization
layered architecture. and Execution
 The figure does not Relational Operators
show the concurrency
control and recovery Files and Access Methods
components. Buffer Management
 This is one of several
Disk Space Management
possible architectures;
each system has its own
variations.
DB

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2


7. Structure of a DBMS

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2


Summary
 DBMS used to maintain, query large datasets.
 Benefits include recovery from system crashes,
concurrent access, quick application development,
data integrity and security.
 Levels of abstraction give data independence.
 A DBMS typically has a layered architecture.
 DBAs hold responsible jobs and
are well-paid! 
 DBMS R&D is one of the broadest,
most exciting areas in CS.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2


Assignments
Câu 1. Tại sao bạn lựa chọn hệ thống CSDL thay vì lưu dữ liệu
trong file quản lý bởi hệ điều hành? Khi nào bạn không nên dùng hệ
CSDL?
Câu 2. Độc lập dữ liệu mức logic (Logical data independence) là gì?
Tại sao nó quan trọng?
Câu 3. Giải thích sự khác nhau giữa độc lập dữ liệu mức logic và
độc lập dữ liệu mức vật lý (physical data independence)? Hãy cho ví
dụ minh họa.
Câu 4. Giải thích sự khác biệt giữa lược đồ ý niệm/logic
(conceptual/logical schema), lược đồ vật lý/bên trong
(Physical/internal schema) và lược đồ ngoài (external schema).
Câu 5. Trách nhiệm của DBA. Giả sử là DBA không cần quan tâm
đến việc thực hiện các câu truy vấn của chính DBA, thì DBA có cần
hiểu về tối ưu hóa câu truy vấn không? Tại sao?
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2
Assignments
Câu 6. Ông A cần mua một hệ CSDL. Để tiết kiệm chi phí, ông A chỉ
mua một hệ CSDL với số tính năng ít nhất có thể. Ông ta lập kế hoạch chỉ
chạy nó một mình trên máy PC của ông ấy và không share thông tin với
ai cả. Hãy cho biết tính năng nào trong các tính năng dưới đây của DBMS
ông A mua nên có và tại sao:
+ Tiện ích bảo mật
+ Kiểm soát đồng thời
+ Khôi phục dữ liệu sau sự cố
+ Cơ chế khung nhìn
+ Ngôn ngữ truy vấn
Câu 7. Mô tả cấu trúc của một DBMS. Giả sử hệ điều hành của bạn được
nâng cấp để hỗ trợ thêm một số chức năng về file (ví dụ khả năng cho
phép lưu một chuỗi các bytes lên đĩa). Hãy cho biết lớp nào của DBMS
bạn cần phải viết lại để có thể tận dụng ưu điểm của các tính năng mới
đó.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2


Assignments
Câu 8. Trả lời các câu hỏi sau:
1.Giao tác (transaction) là gì?
2.Tại sao một DBMS thực hiện xen kẽ các hành động của các giao
dịch khác nhau thay vì thực hiện lần lượt từng giao dịch một ?
3.Một user phải chắc chắn điều gì để đảm bảo tính nhất quán giữa
một giao dịch và CSDL ? Một DBMS nên chắc chắn điều gì để đảm
bảo tính nhất quán giữa thực hiện đồng thời nhiều giao dịch và
CSDL.
4.Giải thích về nghi thức khóa 2 giai đoạn nghiêm ngặt (the strict
two-phase locking protocol).
5.Tính chất WAL là gì và tại sao nó quan trọng ?

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 2

You might also like