COMP 214 DATABASE SYSTEMS NOTES PDF Sem 1 2019 PDF
COMP 214 DATABASE SYSTEMS NOTES PDF Sem 1 2019 PDF
Introduction:
In computerized information system data is the basic resource of the organization. So, proper
organization and management for data is required fro organization to run smoothly. Database
management system deals the knowledge of how data stored and managed on a computerized
information system. In any organization, it requires accurate and reliable data for better decision
making, ensuring privacy of data and controlling data efficiently.The examples include deposit
and/or withdrawal from a bank,hotel,airline or railway reservation, purchase items from
supermarkets in all cases, a database is accessed.
What is data:
Data is the known facts or figures that have implicit meaning. It can also be defined as it is the
representation of facts ,concepts or instruction in a formal manner, which is suitable for
understanding and processing. Data can be represented in alphabets(A-Z, a-z),in digits(0-9) and
using special characters(+,-.#,$, etc) e.g: 25, “ajit” etc.
Information:
Information is the processed data on which decisions and actions are based. Information
can be defined as the organized and classified data to provide meaningful values. Eg: “The age
of Ravi is 25”
File:
The traditional file oriented approach to information processing has for each application a
separate master file and its own set of personal file. In file oriented approach the program
dependent on the files and files become dependent on the files and files become dependents upon
the programs.
The same information may be written in several files. This redundancy leads to higher storage
and access cost. It may lead data inconsistency that is the various copies of the same data may
longer agree for example a changed customer address may be reflected in single file but not else
where in the system.
The conventional file processing system do not allow data to retrieved in a convenient and
efficient manner according to user choice.
3) Data isolation :
Because data are scattered in various file and files may be in different formats with new
application programs to retrieve the appropriate data is difficult.
4) Integrity Problems:
Developers enforce data validation in the system by adding appropriate code in the various
application program. How ever when new constraints are added, it is difficult to change the
programs to enforce them.
5) Atomicity:
It is difficult to ensure atomicity in a file processing system when transaction failure occurs due
to power failure, networking problems etc. (atomicity: either all operations of the transaction are
reflected properly in the database or non are)
6) Concurrent access:
In the file processing system it is not possible to access a same file for transaction at same time
7) Security problems:
There is no security provided in file processing system to secure the data from unauthorized user
access
Database:
2. it is related
for example: consider the roll no, name, address of a student stored in a student file. It is
collection of related data with an implicit meaning.
Persistent:
If data is removed from database due to some explicit request from user to remove.
Integrated:
A database can be a collection of data from different files and when any redundancy among
those files are removed from database is said to be integrated data.
Sharing Data:
The data stored in the database can be shared by multiple users simultaneously with out affecting
the correctness of data.
Why Database:
In order to overcome the limitation of a file system, a new approach was required.
The advantages of database system over traditional, paper based methods of record
keeping are:
compactness:
speed:
The machine can retrieve and modify the data more faster way then human being
A database management system consists of collection of related data and refers to a set of
Function of DBMS:
1. Defining database schema: it must give facility for defining the database
2. Manipulation of the database: The dbms must have functions like insertion of
3. Sharing of database:The DBMS must share data items for multiple users by
5. Database recovery: If for any reason the system fails DBMS must facilitate data base
recovery.
Advantages of dbms:
Reduction of redundancies:
Centralized control of data by the DBA avoids unnecessary duplication of data and
effectively reduces the total amount of data storage required avoiding duplication in the
Sharing of data:
A database allows the sharing of data under its control by any number of application
programs or users.
Data Integrity:
Data integrity means that the data contained in the database is both accurate and
consistent. Therefore data values being entered for storage could be checked to ensure
that they fall with in a specified range and are of the correct format.
Data Security:
The DBA who has the ultimate responsibility for the data in the dbms can ensure that
proper access procedures are followed including proper authentication schemas for access
to the DBS and additional check before permitting access to sensitive data.
Conflict resolution:
DBA resolve the conflict on requirements of various user and applications. The DBA chooses
the best file structure and access method to get optional performance for the
application.
Data Independence:
Types of databases.
Depending upon the usage requirements, there are following types of databases available in the
market:
1. Centralised database.
2. Distributed database.
3. Personal database.
4. End-user database.
5. Commercial database.
6. NoSQL database.
7. Operational database.
8. Relational database.
9. Cloud database.
10. Object-oriented database.
11. Graph database.
1. Centralised Database
The information(data) is stored at a centralized location and the users from different locations
can access this data. This type of database contains application procedures that help the users to
access the data even from a remote location.
Various kinds of authentication procedures are applied for the verification and validation of end
users, likewise, a registration number is provided by the application procedures which keeps a
track and record of data usage. The local area office handles this thing.
2.Distributed Database
Just opposite of the centralized database concept, the distributed database has contributions from
the common database as well as the information captured by local computers also. The data is
not at one place and is distributed at various sites of an organization. These sites are connected to
each other with the help of communication links which helps them to access the distributed data
easily.
You can imagine a distributed database as a one in which various portions of a database are
stored in multiple different locations(physical) along with the application procedures which are
replicated and distributed among various points in a network.
There are two kinds of distributed database, viz. homogenous and heterogeneous. The databases
which have same underlying hardware and run over same operating systems and application
procedures are known as homogeneous DDB, for eg. All physical locations in a DDB. Whereas,
the operating systems, underlying hardware as well as application procedures can be different at
various sites of a DDB which is known as heterogeneous DDB.
3.Personal Database
Data is collected and stored on personal computers which is small and easily manageable. The
data is generally used by the same department of an organization and is accessed by a small
group of people.
The end user is usually not concerned about the transaction or operations done at various levels
and is only aware of the product which may be a software or an application. Therefore, this is a
shared database which is specifically designed for the end user, just like different levels’
managers. Summary of whole information is collected in this database.
5.Commercial Database
These are the paid versions of the huge databases designed uniquely for the users who want to
access the information for help. These databases are subject specific, and one cannot afford to
maintain such a huge information. Access to such databases is provided through commercial
links.
6.NoSQL Database
These are used for large sets of distributed data. There are some big data performance issues
which are effectively handled by relational databases, such kind of issues are easily managed by
NoSQL databases. There are very efficient in analyzing large size unstructured data that may be
stored at multiple virtual servers of the cloud.
7.Operational Database
Information related to operations of an enterprise is stored inside this database. Functional lines
like marketing, employee relations, customer service etc. require such kind of databases.
8.Relational Databases
These databases are categorized by a set of tables where data gets fit into a pre-defined category.
The table consists of rows and columns where the column has an entry for data for a specific
category and rows contains instance for that data defined according to the category. The
Structured Query Language (SQL) is the standard user and application program interface for a
relational database.
There are various simple operations that can be applied over the table which makes these
databases easier to extend, join two databases with a common relation and modify all existing
applications.
9.Cloud Databases
Now a day, data has been specifically getting stored over clouds also known as a virtual
environment, either in a hybrid cloud, public or private cloud. A cloud database is a database that
has been optimized or built for such a virtualized environment. There are various benefits of a
cloud database, some of which are the ability to pay for storage capacity and bandwidth on a per-
user basis, and they provide scalability on demand, along with high availability.
A cloud database also gives enterprises the opportunity to support business applications in a
software-as-a-service deployment.
10.Object-Oriented Databases
An object-oriented database is organized around objects rather than actions, and data rather than
logic. For example, a multimedia record in a relational database can be a definable data object, as
opposed to an alphanumeric value.
11.Graph Databases
The graph is a collection of nodes and edges where each node is used to represent an entity and
each edge describes the relationship between entities. A graph-oriented database, or graph
database, is a type of NoSQL database that uses graph theory to store, map and query
relationships.
Graph databases are basically used for analyzing interconnections. For example, companies
might use a graph database to mine data about customers from social media.
DATABASE USERS
Naive users :
Users who need not be aware of the presence of the database system or any other
system supporting their usage are considered naïve users . A user of an automatic teller
Application programmers :
or user interfaces utilized by the naïve and online user falls into this category.
Database Administration :
A person who has central control over the system is called database administrator .
definition
DATABASE MANAGER
Database manager:
A database manager is a program module which provides the interface between the low
level data stored in the database and the application programs and queries submitted to
the system.
1. Interaction with file manager: The data is stored on the disk using the file system which is
provided by operating system. The database manager translate the the different DML statements
into low-level file system commands. so The database manager is responsible for the actual
storing,retrieving and updating of data in the database.
2. Integrity enforcement:The data values stored in the database must satisfy certain
constraints(eg: the age of a person can't be less then zero).These constraints are specified by
DBA. Data manager checks the constraints and if it satisfies then it stores the data in the
database.
3. Security enforcement:Data manager checks the security measures for database from
unauthorized users.
4. Backup and recovery:Database manager detects the failures occurs due to different causes
(like disk failure, power failure,deadlock,s/w error) and restores the database to original state of
the database.
5. Concurrency control:When several users access the same database file simultaneously, there
may be possibilities of data inconsistency. It is responsible of database manager to control the
problems occurs for concurrent transactions.
DATABASE LANGUAGE :
DDL is used to define database objects .The conceptual schema is specified by a set of
definitions expressed by this language. It also give some details about how to implement this
schema in the physical devices used to store the data. This definition includes all the entity sets
and their associated attributes and their relation ships. The result of DDL statements will be a set
of tables that are stored in special file called data dictionary.
A DML is a language that enables users to access or manipulate data stored in the database. Data
manipulation involves retrieval of data from the database, insertion of new data into the database
and deletion of data or modification of existing data.
procedural: Which requires a user to specify what data is needed and how to get it.
non-rocedural: which requires a user to specify what data is needed with out specifying how
to get it.
This language enables user to grant authorization and canceling authorization of database
objects.
TRANSACTION
A transaction is a unit of program execution that accesses and possibly updates various data
items.A transaction is the DBMS’s abstract view of a user program: a sequence of reads and
writes A transaction must see a consistent database .During transaction execution the database
may be temporarily inconsistent.A sequence of many actions which are considered to be one
atomic unit of work .When the transaction completes successfully (is committed), the database
must be consistent .After a transaction commits, the changes it has made to the database persist,
even if there are system failures .Multiple transactions can execute in parallel.Two main issues to
deal with:
ACID Properties
To preserve the integrity of data the database system transaction mechanism must ensure:
Atomicity. Either all operations of the transaction are properly reflected in the database or none
are
Isolation. Although multiple transactions may execute concurrently, each transaction must be
unaware of other concurrently executing transactions. Intermediate transaction results must be
hidden from other concurrently executed transactions
Durability. After a transaction completes successfully, the changes it has made to the database
persist, even if there are system failures.
The entity-relationship data model perceives the real world as consisting of basic objects,
called entities and relationships among these objects. It was developed to facilitate data base
design by allowing specification of an enterprise schema which represents the overall logical
structure of a data base.
Basic concepts:
There are three basic elements in an ER Diagram: entity, attribute, relationship. There are more
elements which are based on the main elements. They are weak entity, multivalued attribute,
derived attribute, weak relationship and recursive relationship. Cardinality and ordinality are two
other notations used in ER diagrams to further define relationships.
Entity
An entity can be a person, place, event, or object that is relevant to a given system. For example,
a school system may include students, teachers, major courses, subjects, fees, and other items.
Entities are represented in ER diagrams by a rectangle and named using singular nouns.
Weak Entity
A weak entity is an entity that depends on the existence of another entity. In more technical
terms it can defined as an entity that cannot be identified by its own attributes. It uses a foreign
key combined with its attributed to form the primary key. An entity like order item is a good
example for this. The order item will be meaningless without an order so it depends on the
existence of order.
Attribute
Attribute
specific attributes. For example, the attribute “customer address” can have the attributes number,
street, city, and state. These are called composite attributes. Note that some top level ER
diagrams do not show attributes for the sake of simplicity. In those that do, however, attributes
are represented by oval shapes.
Attributes in ER diagrams, note that an attribute can have its own attributes ( composite
attribute )
Multivalued Attribute
If an attribute can have more than one value it is called an multivalued attribute. It is important to
note that this is different to an attribute having its own attributes. For example a teacher entity
can have multiple subject values.
Derived Attribute
An attribute based on another attribute. This is found rarely in ER diagrams. For example for a
circle the area can be derived from the radius.
RELATIONSHIP
A relationship describes how entities interact. For example, the entity “carpenter” may be related
to the entity “table” by the relationship “builds” or “makes”. Relationships are represented by
diamond shapes and are labeled using verbs.
Recursive Relationship
If the same entity participates more than once in a relationship it is known as a recursive
relationship. In the below example an employee can be a supervisor and be supervised, so there
is a recursive relationship.
These two further defines relationships between entities by placing the relationship in the context
of numbers. In an email system, for example, one account can have multiple contacts. The
relationship in this case follows a “one to many” model. There are number of notations used to
present cardinality in ER diagrams. Chen, UML, Crow’s foot, Bachman are some of the popular
notations.
one to one:
HAS PRINCIPAL
COLLEGE
1 1
One to many:
Many –to-many:
Entities in A and B are associated with any number of entities from each other.
© JONAH K NGETICH 0720-254951/0780-254951 Page 23
COM 214 DATABASE SYSTEMS
Keys:
A super key is a set of one or more attributes that taken collectively, allow us to
Candidate key:
Eg: ((cname,telno)
Primary key:
The primary key is the candidate key that is chosen by the database designer as the
principal means of identifying entities with in an entity set. The remaining candidate
Identify all the relevant entities in a given system and determine the relationships among
these entities.
An entity should appear only once in a particular diagram.
Provide a precise and appropriate name for each entity, attribute, and relationship in the
diagram. Terms that are simple and familiar always beats vague, technical-sounding
words. In naming entities, remember to use singular nouns. However, adjectives may be
used to distinguish entities belonging to the same class (part-time employee and full time
employee, for example). Meanwhile attribute names must be meaningful, unique, system-
independent, and easily understandable.
Remove vague, redundant or unnecessary relationships between entities.
Never connect a relationship to another relationship.
Make effective use of colors. You can use colors to classify similar entities or to highlight
key areas in your diagrams
ER Diagram Templates
Benefits of ER diagrams
ER diagrams constitute a very useful framework for creating and manipulating databases. First,
ER diagrams are easy to understand and do not require a person to undergo extensive training to
be able to work with it efficiently and accurately. This means that designers can use ER diagrams
to easily communicate with developers, customers, and end users, regardless of their IT
proficiency. Second, ER diagrams are readily translatable into relational tables which can be
used to quickly build databases. In addition, ER diagrams can directly be used by database
developers as the blueprint for implementing data in specific software applications. Lastly, ER
diagrams may be applied in other contexts such as describing the different relationships and
operations within an organization.
Logical data are the data for the table created by user in primary memory.
A schema is a logical data base description and is drawn as a chart of the types of data that are
used . It gives the names of the entities and attributes and specify the relationships between
them.A database schema includes such information as :
A subschema is derived schema derived from existing schema as per the user
requirement. There may be more then one subschema create for a single conceptual
schema.
THREE-SCHEMA ARCHITECTURE
2.Conceptual level -Describes structure of the whole DB for the complete community of users
3.External or view level -Describes part of the DB of interest to a particular user group
Database Instance
It is important that we distinguish these two terms individually. Database schema is the skeleton
of database. It is designed when the database doesn't exist at all. Once the database is
operational, it is very difficult to make any changes to it. A database schema does not contain
any data or information.
A database instance is a state of operational database with data at any given time. It contains a
snapshot of the database. Database instances tend to change with time. A DBMS ensures that its
every instance (state) is in a valid state, by diligently following all the validations, constraints,
and conditions that the database designers have imposed.
DATA INDEPENDENCE
A database system normally contains a lot of data in addition to users’ data. For example, it
stores data about data, known as metadata, to locate and retrieve data easily. It is rather difficult
to modify or update a set of metadata once it is stored in the database. But as a DBMS expands,
it needs to change over time to satisfy the requirements of the users. If the entire data is
dependent, it would become a tedious and highly complex job.
2.Feasibility study
Feasibility is defined as the practical extent to which a project can be performed successfully. To
evaluate feasibility, a feasibility study is performed, which determines whether the solution
considered to accomplish the requirements is practical and workable in the software. Information
such as resource availability, cost estimation for software development, benefits of the software
to the organization after it is developed and cost to be incurred on its maintenance are considered
during the feasibility study. The objective of the feasibility study is to establish the reasons for
developing the software that is acceptable to users, adaptable to change and conformable to
established standards. Various other objectives of feasibility study are listed below.
TYPES OF FEASIBILITY
Analyzes the technical skills and capabilities of the software development team members
Determines whether the relevant technology is stable and established
Ascertains that the technology chosen for software development has a large number of users so
that they can be consulted when problems arise or improvements are required.
Operational feasibility assesses the extent to which the required software performs a series of
steps to solve business problems and user requirements. This feasibility is dependent on human
resources (software development team) and involves visualizing whether the software will
operate after it is developed and be operative once it is installed. Operational feasibility also
performs the following tasks.
Determines whether the problems anticipated in user requirements are of high priority
© JONAH K NGETICH 0720-254951/0780-254951 Page 32
COM 214 DATABASE SYSTEMS
Determines whether the solution suggested by the software development team is acceptable
Analyzes whether users will adapt to a new software
Determines whether the organization is satisfied by the alternative solutions proposed by the
software development team.
3. SYSTEMS ANALYSIS
The analysis phase is where businesses will work on the source of their problem or the need for a
change. In the event of a problem, possible solutions are submitted and analyzed to identify the
best fit for the ultimate goal(s) of the project. This is where teams consider the functional
requirements of the project or solution. It is also where system analysis takes place—or
analyzing the needs of the end users to ensure the new system can meet their expectations.
Systems analysis is vital in determining what a business"s needs are, as well as how they can be
met, who will be responsible for individual pieces of the project, and what sort of timeline should
be expected. There are several tools businesses can use that are specific to the second phase.
They include:
Structured analysis
4.SYSTEM DESIGN
The design phase describes, in detail, the necessary specifications, features and operations that
will satisfy the functional requirements of the proposed system which will be in place. This is the
step for end users to discuss and determine their specific business information needs for the
proposed system. It"s during this phase that they will consider the essential components
(hardware and/or software) structure (networking capabilities), processing and procedures for the
system to accomplish its objectives.
The fifth phase involves systems integration and system testing (of programs and procedures)—
normally carried out by a Quality Assurance (QA) professional—to determine if the proposed
design meets the initial set of business goals. Testing may be repeated, specifically to check for
errors, bugs and interoperability. This testing will be performed until the end user finds it
acceptable. Another part of this phase is verification and validation, both of which will help
ensure the program"s successful completion.
TYPES OF TESTING
Unit testing
Integration testing
System testing
Unit testing- is a level of software testing where individual units/ components of a software are
tested. The purpose is to validate that each unit of the software performs as designed. A unit is
the smallest testable part of any software. It usually has one or a few inputs and usually a single
output. In procedural programming, a unit may be an individual program, function, procedure,
etc
Intergration testing- is a level of software testing where individual units are combined and tested
as a group. The purpose of this level of testing is to expose faults in the interaction between
integrated units. Test drivers and test stubs are used to assist in Integration Testing.
System Testing- is a level of software testing where a complete and integrated software is tested.
The purpose of this test is to evaluate the system’s compliance with the specified requirements.
6.SYSTEM IMPLEMENTATION
The sixth phase is when the majority of the code for the program is written. Additionally, this
phase involves the actual installation of the newly-developed system. This step puts the project
into production by moving the data and components from the old system and placing them in the
new system via a direct changeover. While this can be a risky (and complicated) move, the
changeover typically happens during off-peak hours, thus minimizing the risk. Both system
analysts and end-users should now see the realization of the project that has implemented
changes.
system. Implementing the new system one department at a time, the company converts accounts
receivable, accounts payable, payroll, and so on. Advantages to phased changeovers are their low
cost and isolated errors. The main disadvantage is the process takes a long time to complete
because phases need to be implemented separately.
Pilot Changeover-With a pilot changeover, the new system is tried out at a test site before
launching it company-wide. For example, a bank may first test the system at one of its branches.
This branch is referred to as the pilot, or beta, site for the program. Since parallel changeovers
tend to be expensive, using the pilot changeover technique allows companies to run the new
system next to their old but on a much smaller scale. This makes the pilot changeover method
much more cost-effective. After the things are worked out of the system at the test site,
companies usually opt to use the direct changeover technique to launch the system company-
wide.
7.REVIEW AND MAINTENANCE
The seventh and final phase involves maintenance and regular required updates. This step is
when end users can fine-tune the system, if they wish, to boost performance, add new
capabilities or meet additional user requirements.
Types of Maintenance
i. Corrective
ii. Adaptive
iii. Perfective
iv. Preventive
Maintenance are focused in decreasing the deterioration of your software in the long run.
software can have in the long term and helps it become scalable, stable, understandable
and maintainable.
NORMALIZATIONS
Normalization is a database design technique which organizes tables in a manner that reduces
redundancy and dependency of data.It divides larger tables to smaller tables and links them using
relationships.
Database Normalization is a technique of organizing the data in the database. Normalization is
a systematic approach of decomposing tables to eliminate data redundancy(repetition) and
undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step
process that puts data into tabular form, removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
Eliminating reduntant(useless) data.
Ensuring data dependencies make sense i.e data is logically stored.
Normalization Rule
Normalization rules are divided into the following normal forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
Our table already satisfies 3 rules out of the 4 rules, as all our column names are unique, we have
stored data in the order we wanted to and we have not inter-mixed different type of data in
columns.
But out of the 3 different students in our table, 2 have opted for more than 1 subject. And we
have stored the subject names in a single column. But as per the 1st Normal form each column
must contain atomic value.
How to solve this Problem?
It's very simple, because all we have to do is break the values into atomic values.
Here is our updated table and it now satisfies the First Normal Form.
101 Akon OS
101 Akon CN
102 Bkon C
By doing so, although a few values are getting repeated but values for the subject column are
now atomic for each record/row.Using the First Normal Form, data redundancy increases, as
there will be many columns with same data in multiple rows but each row as a whole will be
unique.
In this table, student_id is the primary key and will be unique for every row, hence we can
use student_id to fetch any row of data from this table
Even for a case, where student names are same, if we know the student_id we can easily fetch
the correct record.
Hence we can say a Primary Key for a table is the column or a group of columns(composite key)
which can uniquely identify each record in the table.
I can ask from branch name of student with student_id 10, and I can get it. Similarly, if I ask for
name of student with student_id 10 or 11, I will get it. So all I need is student_id and every other
column depends on it, or can be fetched using it.
This is Dependency and we also call it Functional Dependency.
For a simple table like Student, a single column like student_id can uniquely identfy all the
records in a table.
But this is not true all the time. So now let's extend our example to see if more than 1 column
together can act as a primary key.
Let's create another table for Subject, which will have subject_id and subject_name fields
and subject_id will be the primary key.
subject_id subject_name
1 Java
2 C++
3 Php
Now we have a Student table with student information and another table Subject for storing
subject information.
Let's create another table Score, to store the marks obtained by students in the respective
subjects. We will also be saving name of the teacher who teaches that subject along with marks.
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are these
and subject_id to know for which subject the marks are for.
Together, student_id + subject_id forms a Candidate Key(learn about Database Keys) for this
table, which can be the Primary key.
Confused, How this combination can be a primary key?
See, if I ask you to get me marks of student with student_id 10, can you get it from this table?
No, because you don't know for which subject. And if I give you subject_id, you would not
know for which student. Hence we need student_id + subject_id to uniquely identify any row.
But where is Partial Dependency?
Now if you look at the Score table, we have a column names teacher which is only dependent on
the subject, for Java it's Java Teacher and for C++ it's C++ Teacher & so on.
Now as we just discussed that the primary key for this table is a composition of two columns
which is student_id & subject_id but the teacher's name only depends on subject, hence
the subject_id, and has nothing to do with student_id.
This is Partial Dependency, where an attribute in a table depends on only a part of the primary
key and not on the whole key.
And our Score table is now in the second normal form, with no partial dependency.
1 10 1 70
2 10 2 75
3 11 1 80
NB
For a table to be in the Second Normal form, it should be in the First Normal form and it
should not have Partial Dependency.
Partial Dependency exists, when for a composite primary key, any attribute in the table
depends only on a part of the primary key and not on the complete primary key.
To remove Partial dependency, we can divide the table, remove the attribute which is
causing partial dependency, and move it to some other table where it fits in well.
Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,
1. It is in the Second Normal form.
2. And, it doesn't have Transitive Dependency.
So let's use the same example, where we have 3 tables, Student, Subject and Score.
Student Table
Subject Table
Score Table
1 10 1 70
2 10 2 75
3 11 1 80
In the Score table, we need to store some more information, which is the exam name and total
marks, so let's add 2 more columns to the Score table.
Well, the column total_marks depends on exam_name as with exam type the total score changes.
For example, practicals are of less marks while theory exams are of more marks.
But, exam_name is just another column in the score table. It is not a primary key or even a part
of the primary key, and total_marks depends on it.
This is Transitive Dependency. When a non-prime attribute depends on other non-prime
attributes rather than depending upon the prime attributes or primary key.
1 Workshop 200
2 Mains 70
3 Practicals 30
If a database design is not perfect, it may contain anomalies, which are like a
bad dream for any database administrator. Managing a database with anomalies
is next to impossible.
Update anomalies: If data items are scattered and are not linked to each other properly, then it
could lead to strange situations. For example, when we try to update one data item having its
copies scattered over several places, a few instances get updated properly while a few others are
left with old values. Such instances leave the database in an inconsistent state.
Deletion anomalies: We tried to delete a record, but parts of it was left undeleted because of
unawareness, the data is also saved somewhere else.
Insert anomalies: We tried to insert data in a record that does not exist at all. Normalization is
a method to remove all these anomalies and bring the database.
REVISION QUESTIONS
1. Keys. What is the difference between a candidate key, a primary key and a
composite key? What considerations might influence the choice of a primary key?
2. Give a brief description of the following
i. DBMS
ii. Data Dictionary (DD)
iii. Data Manipulation Language(DML)
3. Outline the main responsibilities of the Database Administrator ,End-users and
Application Developers
4. Define and hence explain the use of
i. Data mining
ii. Data integrity
5. Many organization decides to use a database rather than a file management system for
their applications. Explain the difference between a file management system and a
database
6. Carefully distinguish between each of the following pairs of terms
(i) Field and record
(ii) Transaction file and back-up file
(iii)Object- oriented database model and distributed database model
(iv) Atomicity and consistency
7. A business has a file of customer orders which consists of records containing the fields ;
a) Customer reference number
b) Date of order
c) Customer name and address
d) Delivery town
e) Product code
f) Quantity
(i) Define the term key field for a record and identify the key field for the file of
customer orders above.
(ii) What is a sort key(secondary key) by reference to the above file explain how
processing the file using a sort key can produce useful information for the
business
8. List and Explain the Advantages and Disadvantages of Database Management sytems.
9. Discuss the following process as expressed in database/system development life cycle
(SDLC)
(i) Information systems planning
(ii) System initialization and identification
(iii)Feasibility study
(iv) System investigation
(v) System analysis
(vi) System implementation
(vii) System maintenance and documentations