Data Independence: UNIT-2
Data Independence: UNIT-2
Data Independence
The three-schema architecture can be used to explain the concept of data independence,
which can be defined as the capacity to change the schema at one level of a database system
without having to change the schema at the next higher level. We can define two types of data
independence:
1. Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the
conceptual schema to expand the database (by adding a record type or data item), or to
reduce the database (by removing a record type or data item). In the latter case,
external schemas that refer only to the remaining data should not be affected. Only the
view definition and the mappings need be changed in a DBMS that supports logical data
independence. Application programs that reference the external schema constructs
must work as before, after the conceptual schema undergoes a logical reorganization.
Changes to constraints can be applied also to the conceptual schema without affecting
the external schemas or application programs.
2. Physical data independence is the capacity to change the internal schema without having
to change the conceptual (or external) schemas. Changes to the internal schema may be
needed because some physical files had to be reorganized—for example, by creating
additional access structures—to improve the performance of retrieval or update. If the
same data as before remains in the database, we should not have to change the
conceptual schema.
Whenever we have a multiple-level DBMS, its catalog must be expanded to include information
on how to map requests and data among the various levels. The DBMS uses additional software
to accomplish these mappings by referring to the mapping information in the catalog. Data
independence is accomplished because, when the schema is changed at some level, the schema
at the next higher level remains unchanged; only the mapping between the two levels is
changed. Hence, application programs referring to the higher-level schema need not be
changed.
The three-schema architecture can make it easier to achieve true data independence, both
physical and logical. However, the two levels of mappings create an overhead during
compilation or execution of a query or program, leading to inefficiencies in the DBMS. Because
of this, few DBMSs have implemented the full three-schema architecture.
Database administration is the function of managing and maintaining database management sy
stems (DBMS) software. Mainstream DBMS software such as Oracle, IBM DB2 and Microsoft
SQL Server need ongoing management. As such, corporations that use DBMS software often
hire specialized IT (Information Technology) personnel called Database Administrators or DBAs.
DATABASE ADMINISTRATION: Database administration is the function of managing an
d maintaining database management systems software.
The degree to which the administration of a database is automated dictates the skills and
personnel required to manage databases. On one end of the spectrum, a system with minimal
automation will require significant experienced resources to manage; perhaps 5-10 databases
per DBA. Alternatively an organization might choose to automate a significant amount of the
work that could be done manually therefore reducing the skills required to perform tasks. As
automation increases, the personnel needs of the organization splits into highly skilled workers
to create and manage the automation and a group of lower skilled "line" DBAs who simply
execute the automation.
Database administration work is complex, repetitive, time-consuming and requires significant
training. Since databases hold valuable and mission-critical data, companies usually look for
candidates with multiple years of experience. Database administration often requires DBAs to
put in work during off-hours (for example, for planned after hours downtime, in the event of a
database-related outage or if performance has been severely degraded). DBAs are commonly
well compensated for the long hours
One key skill required and often overlooked when selecting a DBA is database recovery (under
disaster recovery). It is not a case of “if” but a case of “when” a database suffers a failure,
ranging from a simple failure to a full catastrophic failure. The failure may be data corruption,
media failure, or user induced errors. In either situation the DBA must have the skills to recover
the database to a given point in time to prevent a loss of data. A highly skilled DBA can spend a
few minutes or exceedingly long hours to get the database back to the operational point.
Another kind of database software exists to manage the provisioning of new databases and the
management of existing databases and their related resources. The process of creating a new
database can consist of hundreds or thousands of unique steps from satisfying prerequisites to
configuring backups where each step must be successful before the next can start. A human
cannot be expected to complete this procedure in the same exact way time after time - exactly
the goal when multiple databases exist. As the number of DBAs grows, without automation the
number of unique configurations frequently grows to be costly/difficult to support. All of these
complicated procedures can be modeled by the best DBAs into database automation software a
nd executed by the standard DBAs. Software has been created specifically to improve the
reliability and repeatability of these procedures such as Stratavia's Data Palette and GridApp
Systems Clarity.
DBA Responsibilities:
Installation, configuration and upgrading of Database server software and related
products.
Evaluate Database features and Database related products.
Establish and maintain sound backup and recovery policies and procedures.
Take care of the Database design and implementation.
Implement and maintain database security (create and maintain users and roles, assign
privileges).
Database tuning and performance monitoring.
Application tuning and performance monitoring.
Setup and maintain documentation and standards.
Plan growth and changes (capacity planning).
Work as part of a team and provide 24x7 support when required.
Do general technical troubleshooting and give cons.
Database recovery.
UNIT-3
Database Model
A Database model defines the logical design of data. The model describes the relationships
between different parts of the data. Historically, in database design, three models are
commonly used. They are:
1) Hierarchical Model: In this model each entity has only one parent but can have several
children. At the top of hierarchy there is only one entity which is called Root.
Advantages:
Simplicity: Data naturally have hierarchical relationship in most of the practical
situations. Therefore, it is easier to view data arranged in manner. This makes this type
of database more suitable for the purpose.
Security: These database system can enforce varying degree of security feature unlike
flat-file system.
Database Integrity: Because of its inherent parent-child structure, database integrity is
highly promoted in these systems.
Efficiency: The hierarchical database model is a very efficient, one when the database
contains a large number of I : N relationships (one-to-many relationships) and when the
users require large number of transactions, using data whose relationships are fixed.
Disadvantages:
Complexity of Implementation: The actual implementation of a hierarchical database
depends on the physical storage of data. This makes the implementation complicated.
Difficulty in Management: The movement of a data segment from one location to
another cause all the accessing programs to be modified making database management
a complex affair.
Complexity of Programming: Programming a hierarchical database is relatively complex
because the programmers must know the physical path of the data items.
Operational Anomalies: As discussed earlier, hierarchical model suffers from the Insert
anomalies, Update anomalies and Deletion anomalies, also the retrieval operation is
complex and asymmetric, and thus hierarchical model is not suitable for all the cases.
2) Network Model: In the network model, entities are organized in a graph, in which some
entities can be accessed through several paths.
Advantages:
Conceptual simplicity: Just like the hierarchical model, the network model IS also
conceptually simple and easy to design.
Capability to handle more relationship types: The network model can handle the one
to- many (l: N) and many to many (N: N) relationships, which is a real help in modeling
the real life situations.
Ease of data access: The data access is easier and flexible than the hierarchical model.
Data Integrity: The network model does not allow a member to exist without an
owner. Thus, a user must first define the owner record and then the member record.
This ensures the data integrity.
Data independence: The network model is better than the hierarchical model in
isolating the programs from the complex physical storage details.
Disadvantages:
System complexity: All the records are maintained using pointers and hence the whole
database structure becomes very complex.
Operational Anomalies: As discussed earlier, network model's insertion, deletion and
updating operations of any record require large number of pointer adjustments, which
makes its implementation very complex and complicated.
Absence of structural independence: Since the data access method in the network
database model is a navigational system, making structural changes to the database is
very difficult in most cases and impossible in some cases. If changes are made to the
database structure then all the application programs need to be modified before they
can access data. Thus, even though the network database model succeeds in achieving
data independence, it still fails to achieve structural independence.
Advantages:
Structural independence: In relational model, changes in the database structure do not
affect the data access. When it is possible to make change to the database structure
without affecting the DBMS's capability to access data, we can say that structural
independence has been achieved. So, relational database model has structural
independence.
Conceptual simplicity: We have seen that both the hierarchical and the network
database model were conceptually simple. But the relational database model is even
simpler at the conceptual level. Since the relational data model frees the designer from
the physical data storage details, the designers can concentrate on the logical view of
the database.
Design, implementation, maintenance and usage ease: The relational database model\
achieves both data independence and structure independence making the database
design, maintenance, administration and usage much easier than the other models.
Disadvantages:
Hardware overheads: Relational database system hides the implementation
complexities and the physical data storage details from the users. For doing this, i.e. for
making things easier for the users, the relational database systems need more powerful
hardware computers and data storage devices. So, the RDBMS needs powerful
machines to run smoothly. But, as the processing power of modem computers is
increasing at an exponential rate and in today's scenario, the need for more processing
power is no longer a very big issue.
Ease of design can lead to bad design: The relational database is an easy to design and
use. The users need not know the complex details of physical data storage. They need
not know how the data is actually stored to access it. This ease of design and use can
lead to the development and implementation of very poorly designed database
management systems.
'Information island' phenomenon: As we have said before, the relational database
systems are easy to implement and use. This will create a situation where too many
people or departments will create their own databases and applications. These
information islands will prevent the information integration that is essential for the
smooth and efficient functioning of the organization. These individual databases will
also create problems like data inconsistency, data duplication, data redundancy and so
on.
Normalization of Database
Database Normalization is a technique of organizing the data in the database. Normalization is
a systematic approach of decomposing tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that
puts data into tabular form by removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
Eliminating redundant (useless) data.
Ensuring data dependencies make sense i.e. data is logically stored.
Updation Anamoly : To update address of a student who occurs twice or more than
twice in a table, we will have to update S_Address column in all the rows, else data will
become inconsistent.
Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name
and address of a student but if student has not opted for any subjects yet then we have
to insert NULL there, leading to Insertion Anamoly.
Deletion Anamoly: If (S_id) 401 has only one subject and temporarily he drops it, when
we delete that row, entire student record will be deleted along with it.
Normalization Rule: Normalization rule are divided into following normal form.
1) First Normal Form (1NF): As per First Normal Form, no two Rows of data must contain
repeating group of information i.e. each set of column must have a unique value, such that
multiple columns cannot be used to fetch the same row. Each table should be organized
into rows, and each row should have a primary key that distinguishes it as unique.
The Primary key is usually a single column, but sometimes more than one column can be
combined to create a single primary key. For example consider a table which is not in First
normal form
Student Table:
In First Normal Form, any row must not have a column in which more than one value is saved,
like separated with commas. Rather than that, we must separate such data into multiple rows.
Student Table following 1NF will be:
Using the First Normal Form, data redundancy increases, as there will be many columns with
same data in multiple rows but each row as a whole will be unique.
2) Second Normal Form (2NF): As per the Second Normal Form there must not be any partial
dependency of any column on primary key. It means that for a table that has concatenated
primary key, each column in the table that is not part of the primary key must depend upon
the entire concatenated key for its existence. If any column depends only on one part of the
concatenated key, then the table fails Second normal form.
In example of First Normal Form there are two rows for Adam, to include multiple subjects
that he has opted for. While this is searchable, and follows First normal form, it is an inefficient
use of space. Also in the above Table in First Normal Form, while the candidate key is
{Student, Subject}, Age of Student only depends on Student column, which is incorrect as per
Second Normal Form. To achieve second normal form, it would be helpful to split out the
subjects into an independent table, and match them up using the student names as foreign
keys.
New Student Table following 2NF will be:
Student Age
Adam 15
Alex 14
Stuart 17
In Student Table the candidate key will be Student column, because all other column i.e. Age is
dependent on it.
New Subject Table introduced for 2NF will be:
Student Subject
Adam Biology
Adam Maths
Alex Maths
Stuart Maths
In Subject Table the candidate key will be {Student, Subject} column. Now, both the above
tables qualify for Second Normal Form and will never suffer from Update Anomalies. Although
there are a few complex cases in which table in Second Normal Form suffers Update Anomalies,
and to handle those scenarios Third Normal Form is there.
3) Third Normal Form (3NF): Third Normal form applies that every non-prime attribute of table
must be dependent on primary key, or we can say that, there should not be the case that a non-
prime attribute is determined by another non-prime attribute. So this transitive functional
dependency should be removed from the table and also the table must be in Second Normal form.
For example, consider a table with following fields.
Student_Detail Table:
In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to apply 3NF,
we need to move the street, city and state to new table, with Zip as primary key.
Address Table:
4) Boyce and Codd Normal Form (BCNF): is a higher version of the Third Normal form. This form
deals with certain type of Anamoly that is not handled by 3NF. A 3NF table which does not have
multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following
conditions must be satisfied: