Unit - 1
Unit - 1
php
https://www.tutorialcup.com/dbms/file-processing-system.htm
DBMS
Traditional Approach for Data Storage and the Need of DBMS
2. Application programs go through the file system in order to access these flat
files
• End of records and end of files will be marked using any predetermined
character set or special characters in order to identify them.
1. Data Mapping and Access: - Although all the related informations are
grouped and stored in different files, there is no mapping between any
two files. i.e.; any two dependent files are not linked. Even though
Student files and Student_Report files are related, they are two different
files and they are not linked by any means. Hence if we need to display
student details along with his report, we cannot directly pick from those
two files. We have to write a lengthy program to search Student file first,
get all details, then go Student_Report file and search for his report.
3. Data Dependence: - In the files, data are stored in specific format, say
tab, comma or semicolon. If the format of any of the file is changed, then
the program for processing this file needs to be changed. But there
would be many programs dependent on this file. We need to know in
advance all the programs which are using this file and change in the
entire place. Missing to change in any one place will fail whole
application. Similarly, changes in storage structure, or accessing the
data, affect all the places where this file is being used. We have to
change it entire programs. That is smallest change in the file affect all
the programs and need changes in all them.
4. Data inconsistency: - Imagine Student and Student_Report files have
student’s address in it, and there was a change request for one particular
student’s address. The program searched only Student file for the
address and it updated it correctly. There is another program which
prints the student’s report and mails it to the address mentioned in the
Student_Report file. What happens to the report of a student whose
address is being changed? There is a mismatch in the actual address
and his report is sent to his old address. This mismatch in different
copies of same data is called data inconsistency. This has occurred
here, because there is no proper listing of files which has same copies of
data.
5. Data Isolation: - Imagine we have to generate a single report of student,
who is studying in particular class, his study report, his library book
details, and hostel information. All these informations are stored in
different files. How do we get all these details in one report? We have to
write a program. But before writing the program, the programmer should
find out which all files have the information needed, what is the format of
each file, how to search data in each file etc. Once all these analysis is
done, he writes a program. If there is 2-3 files involved, programming
would be bit simple. Imagine if there is lot many files involved in it? It
would be require lot of effort from the programmer. Since all the datas
are isolated from each other in different files, programming becomes
difficult.
6. Security: - Each file can be password protected. But what if have to give
access to only few records in the file? For example, user has to be given
access to view only their bank account information in the file. This is very
difficult in the file system.
7. Integrity: - If we need to check for certain insertion criteria while entering
the data into file it is not possible directly. We can do it writing programs.
Say, if we have to restrict the students above age 18, then it is by means
of program alone. There is no direct checking facility in the file system.
Hence these kinds of integrity checks are not easy in file system.
8. Atomicity: - If there is any failure to insert, update or delete in the file
system, there is no mechanism to switch back to the previous state.
Imagine marks for one particular subject needs to be entered into the
Report file and then total needs to be calculated. But after entering the
new marks, file is closed without saving. That means, whole of the
required transaction is not performed. Only the totaling of marks has
been done, but addition of marks not being done. The total mark
calculated is wrong in this case. Atomicity refers to completion of whole
transaction or not completing it at all. Partial completion of any
transaction leads to incorrect data in the system. File system does not
guarantee the atomicity. It may be possible with complex programs, but
introduce for each of transaction costs money.
9. Concurrent Access: - Accessing the same data from the same file is
called concurrent access. In the file system, concurrent access leads to
incorrect data. For example, a student wants to borrow a book from the
library. He searches for the book in the library file and sees that only one
copy is available. At the same time another student also, wants to
borrow same book and checks that one copy available. First student opt
for borrow and gets the book. But it is still not updated to zero copy in the
file and the second student also opt for borrow! But there are no books
available. This is the problem of concurrent access in the file system.
All the files in the file processing system are known as tables in the database.
In database, each of column values are known as attribute and each row of
information is known as record.
There no difference in the data being stored. But it is different from the file
system by the way data is stored and accessed in the database.
In the database, each set of information is stored in the form of rows and
columns.
We define a unique key column for each record known as primary key.
Using primary key, we can access the data much faster than file system. We can
also define a mapping between any two related tables. This helps in reducing
unnecessary data storage and faster retrieval of data.
Advantages of DBMS
1. Data Mapping and Access: - DBMS defines the way to map any
two related tables by means of primary key –foreign key
relationship. Primary key is the column in the table which
responsible for uniquely identifying each record in the table.
Foreign key is the column in the table which is a primary key in
other table and with which the entries in the current table are
related to other table.
We can see the difference in the way data is being stored in the file
and database system. Primary key, foreign keys are defined;
unnecessary columns are removed from the STUDENT_REPORT
table in the database system. These are missing in the file
processing system.
All these levels of security and access are not allowed in file
system.
Disadvantages of DBMS
What is Database?
Database is a computer based record keeping system which is used to
record ,maintain and retrieve data.It is an organized collection of interrelated
(persistent) data.
1. Centralized Database
1. Data management
2. Data definition
3. Transaction support
4. Concurrency control
5. Recovery
6. Security and integrity
7. Facilities to import and export data
8. user management
9. backup
10. performance analysis
11. logging
12. audit
13. physical storage control
View of Data : Database Abstraction
Database is full of datas and records.
What we see in rows and columns is quite different when it reaches the
memory. When they are stored in the memory like disks or tapes, they are
stored in the form of bits. But any users will not understand these bits. He
needs to see the actual data to understand.
But all the details about the data stored in the memory are not necessary
for the users. He needs only little information that he is interested or wants
to work with.
Masking the unwanted data from the users happens at different levels in
the database. This masking of data is called data abstraction. There are 4
levels of data abstraction.
Any changes/ computations done at this level will not affect other
levels of data. That means, if we retrieve the few columns of the
STUDENT table, it will not change the whole table, or if we calculate
the CGPA of a Student, it will not change/update the table. This level
of data is based on the below levels, but it will not alter the data at
below levels.
Logical/ Conceptual level –
This is the next level of abstraction. It describes the actual data
stored in the database in the form of tables and relates them by
means of mapping.
This level will not have any information on what a user views at
external level. This level will have all the data in the database.
Any changes done in this level will not affect the external or physical
levels of data. That is any changes to the table structure or the
relation will not modify the data that the user is viewing at the external
view or the storage at the physical level. For example, suppose we
have added a new column ‘skills’ which will not modify the external
view data on which the user was viewing Ages of the students.
Similarly, it will have space allocated for ‘Skills’ in the physical
memory, but it will not modify the space or address of Date of Birth
(using which Age will be derived) in the memory. Hence external and
physical independence is achieved.
2-tier Architecture
In 2-tier architecture, application program directly interacts with the database. There will not be any user
interface or the user involved with database interaction. Imagine a front end application of School, where we
need to display the reports of all the students who are opted for different subjects. In this case, the application
will directly interact with the database and retreive all required data. Here no inputs from the user are required.
This involves 2-tier architecture of the database.
Let us consider another example of two tier architecture. Consider a railway ticket reservation system. How
does this work? Imagine a person is reserving the ticket from Delhi to Goa on particular day. At the same time
another person in some other place of Delhi is also reserving the ticket to Goa on the same day for the same
train. Now there is a requirement for two tickets, but for different persons. What will reservation system do? It
takes the request from both of them, and queues the requests entered by each of them. Here the request
entered to application layer and request is sent to database layer. Once the request is processed in database,
the result is sent back to application layer for the user.
Advan
tages of 2-tier Architecture
Easy to understand as it directly communicates with the database.
Requested data can be retrieved very quickly, when there is less number of users.
Easy to modify – any changes required, directly requests can be sent to database
Easy to maintain – When there are multiple requests, it will be handled in a queue and there will not be
any chaos.
3-tier Architecture
3-tier architecture is the most widely used database architecture. It can be viewed as below.
Presentation layer / User layer is the layer where user uses the database. He does not have any
knowledge about underlying database. He simply interacts with the database as though he has all
data in front of him. You can imagine this layer as a registration form where you will be inputting
your details. Did you ever guessed, after pressing ‘submit’ button where the data goes? No right?
You just know that your details are saved. This is the presentation layer where all the details from
the user are taken, sent to the next layer for processing.
Application layer is the underlying program which is responsible for saving the details that you have
entered, and retrieving your details to show up in the page. This layer has all the business logics
like validation, calculations and manipulations of data, and then sends the requests to database to
get the actual data. If this layer sees that the request is invalid, it sends back the message to
presentation layer. It will not hit the database layer at all.
Data layer or Database layer is the layer where actual database resides. In this layer, all the tables,
their mappings and the actual data present. When you save you details from the front end, it will be
inserted into the respective tables in the database layer, by using the programs in the application
layer. When you want to view your details in the web browser, a request is sent to database layer
by application layer. The database layer fires queries and gets the data. These data are then
transferred to the browser (presentation layer) by the programs in the application layer.
Advantages of a DBMS
1. Data independence
2. Reduced data redundancy
3. Increased security
4. Better flexibility
5. Effective data sharing
6. Enforces integrity constraints
7. Enables backup and recovery
What is ER Modeling?
Entity
Any thing that has an independent existence and about which we collect data.
It is also known as entity type.
Entity instance
Regular Entity
Weak entity
An entity which depends on other entity for its existence and doesn't have any
key attribute of its own is a weak entity.
Domain of Attributes
The set of possible values that an attribute can take is called the domain of the
attribute. For example, the attribute day may take any value from the set
{Monday, Tuesday ... Friday}. Hence this set can be termed as the domain of
the attribute day.
Key attribute
The attribute (or combination of attributes) which is unique for every entity
instance is called key attribute.
Simple attribute
Example for composite attribute : Name of the employee which can be split
into First_name, Middle_name, and Last_name.
If an attribute can take only a single value for each entity instance, it is a single
valued attribute.
example for single valued attribute : age of a student. It can take only one
value for a particular student.
Multi-valued Attributes
If an attribute can take more than one value for each entity instance, it is a
multi-valued attribute. Multi-valued
Stored Attribute
Derived Attribute
Example for derived attribute : age of employee which can be calculated from
date of birth and current date.
Degree of a Relationship
Degree of a relationship is the number of entity types involved. The n-ary
relationship is the general form for degree n. Special cases are unary, binary,
and ternary ,where the degree is 1, 2, and 3, respectively.
Example for ternary relationship : customer purchase item from a shop keeper
Cardinality of a Relationship
One employee works in only one organization But one organization can have
many employees. Hence it is a M:1 relationship and cardinality is Many-to-
One (M :1)
One student can enroll for many courses and one course can be enrolled by
many students. Hence it is a M:N relationship and cardinality is Many-to-
Many (M:N)
1. Total
2. Partial
Here all employees will not be the head of the department. Only one employee
will be the head of the department. In other words, only few instances of
employee entity participate in the above relationship. So employee entity's
participation is partial in the said relationship.
Advantages
Disadvantages
1. Physical design derived from E-R Model may have some amount of
ambiguities or inconsistency.
Here we are going to design an Entity Relationship (ER) model for a college
database . Say we have the following statements.
1. Department
2. Course
3. Instructor
4. Student
Stem 2 : Identify the relationships
1. One department offers many courses. But one particular course can be
offered by only one department. hence the cardinality between department
and course is One to Many (1:N)
2. One department has multiple instructors . But instructor belongs to only
one department. Hence the cardinality between department and instructor
is One to Many (1:N)
3. One department has only one head and one head can be the head of only
one department. Hence the cardinality is one to one. (1:1)
4. One course can be enrolled by many students and one student can enroll
for many courses. Hence the cardinality between course and student is
Many to Many (M:N)
5. One course is taught by only one instructor. But one instructor teaches
many courses. Hence the cardinality between course and instructor is Many
to One (N :1)
Step 3: Identify the key attributes