0% found this document useful (0 votes)
70 views

Data Independence: UNIT-2

Database administration involves managing database management systems software. It includes tasks like installation, configuration, backups, performance optimization, and disaster recovery. Database administrators focus on either systems administration, development activities, or application support depending on their role. As databases hold critical data, DBAs require extensive training and experience in areas like logical data modeling, query tuning, and database recovery procedures. Automation tools help standardize and simplify common DBA functions, though experienced DBAs are still needed to create such tools and handle complex issues.

Uploaded by

babita_27
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
70 views

Data Independence: UNIT-2

Database administration involves managing database management systems software. It includes tasks like installation, configuration, backups, performance optimization, and disaster recovery. Database administrators focus on either systems administration, development activities, or application support depending on their role. As databases hold critical data, DBAs require extensive training and experience in areas like logical data modeling, query tuning, and database recovery procedures. Automation tools help standardize and simplify common DBA functions, though experienced DBAs are still needed to create such tools and handle complex issues.

Uploaded by

babita_27
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

UNIT-2

 Data Independence
The three-schema architecture can be used to explain the concept of data independence,
which can be defined as the capacity to change the schema at one level of a database system
without having to change the schema at the next higher level. We can define two types of data
independence:

1. Logical data independence is the capacity to change the conceptual schema without
having to change external schemas or application programs. We may change the
conceptual schema to expand the database (by adding a record type or data item), or to
reduce the database (by removing a record type or data item). In the latter case,
external schemas that refer only to the remaining data should not be affected. Only the
view definition and the mappings need be changed in a DBMS that supports logical data
independence. Application programs that reference the external schema constructs
must work as before, after the conceptual schema undergoes a logical reorganization.
Changes to constraints can be applied also to the conceptual schema without affecting
the external schemas or application programs.

2. Physical data independence is the capacity to change the internal schema without having
to change the conceptual (or external) schemas. Changes to the internal schema may be
needed because some physical files had to be reorganized—for example, by creating
additional access structures—to improve the performance of retrieval or update. If the
same data as before remains in the database, we should not have to change the
conceptual schema.

Whenever we have a multiple-level DBMS, its catalog must be expanded to include information
on how to map requests and data among the various levels. The DBMS uses additional software
to accomplish these mappings by referring to the mapping information in the catalog. Data
independence is accomplished because, when the schema is changed at some level, the schema
at the next higher level remains unchanged; only the mapping between the two levels is
changed. Hence, application programs referring to the higher-level schema need not be
changed.

The three-schema architecture can make it easier to achieve true data independence, both
physical and logical. However, the two levels of mappings create an overhead during
compilation or execution of a query or program, leading to inefficiencies in the DBMS. Because
of this, few DBMSs have implemented the full three-schema architecture.
Database administration is the function of managing and maintaining database management sy
stems (DBMS) software. Mainstream DBMS software such as Oracle, IBM DB2 and Microsoft
SQL Server need ongoing management. As such, corporations that use DBMS software often
hire specialized IT (Information Technology) personnel called Database Administrators or DBAs.
 DATABASE ADMINISTRATION: Database administration is the function of managing an
d maintaining database management systems software.

 Types of database administration


1) Systems DBAs (also referred to as Physical DBAs, Operations DBAs or Production
Support DBAs): focus on the physical aspects of database administration such as DBMS
installation, configuration, patching, upgrades, backups, restores, refreshes,
performance optimization, maintenance and disaster recovery.
2) Development DBAs: focus on the logical and development aspects of database
administration such as data model design and maintenance, DDL (data definition
language) generation, SQL writing and tuning, coding stored procedures, collaborating
with developers to help choose the most appropriate DBMS feature/functionality and
other pre-production activities.
3) Application DBAs: usually found in organizations that has purchased 3rd party
application software such as ERP (enterprise resource planning) and CRM (customer
relationship management) systems. Examples of such application software include
Oracle Applications, Siebel and PeopleSoft (both now part of Oracle Corp.) and SAP.
Application DBAs straddle the fence between the DBMS and the application software
and are responsible for ensuring that the application is fully optimized for the database
and vice versa. They usually manage all the application components that interact with
the database and carry out activities such as application installation and patching,
application upgrades, database cloning, building and running data cleanup routines,
data load process management, etc.

While individuals usually specialize in one type of database administration, in smaller


organizations, it is not uncommon to find a single individual or group performing more than one
type of database administration.

 Nature of database administration

The degree to which the administration of a database is automated dictates the skills and
personnel required to manage databases. On one end of the spectrum, a system with minimal
automation will require significant experienced resources to manage; perhaps 5-10 databases
per DBA. Alternatively an organization might choose to automate a significant amount of the
work that could be done manually therefore reducing the skills required to perform tasks. As
automation increases, the personnel needs of the organization splits into highly skilled workers
to create and manage the automation and a group of lower skilled "line" DBAs who simply
execute the automation.
Database administration work is complex, repetitive, time-consuming and requires significant
training. Since databases hold valuable and mission-critical data, companies usually look for
candidates with multiple years of experience. Database administration often requires DBAs to
put in work during off-hours (for example, for planned after hours downtime, in the event of a
database-related outage or if performance has been severely degraded). DBAs are commonly
well compensated for the long hours

One key skill required and often overlooked when selecting a DBA is database recovery (under
disaster recovery). It is not a case of “if” but a case of “when” a database suffers a failure,
ranging from a simple failure to a full catastrophic failure. The failure may be data corruption,
media failure, or user induced errors. In either situation the DBA must have the skills to recover
the database to a given point in time to prevent a loss of data. A highly skilled DBA can spend a
few minutes or exceedingly long hours to get the database back to the operational point.

 Database administration tools


Often, the DBMS software comes with certain tools to help DBAs manage the DBMS. Such tools
are called native tools. For example, Microsoft SQL Server comes with SQL Server Management
Studio and Oracle has tools such as SQL*Plus and Oracle Enterprise Manager/Grid Control. In
addition, 3rd parties such as BMC, Quest Software, Embarcadero Technologies, EMS Database
Management Solutions and SQL Maestro Group offer GUI tools to monitor the DBMS and help
DBAs carry out certain functions inside the database more easily.

Another kind of database software exists to manage the provisioning of new databases and the
management of existing databases and their related resources. The process of creating a new
database can consist of hundreds or thousands of unique steps from satisfying prerequisites to
configuring backups where each step must be successful before the next can start. A human
cannot be expected to complete this procedure in the same exact way time after time - exactly
the goal when multiple databases exist. As the number of DBAs grows, without automation the
number of unique configurations frequently grows to be costly/difficult to support. All of these
complicated procedures can be modeled by the best DBAs into database automation software a
nd executed by the standard DBAs. Software has been created specifically to improve the
reliability and repeatability of these procedures such as Stratavia's Data Palette and GridApp
Systems Clarity.

 The DBA should posses the following skills


1) A good knowledge of the operating system(s)
2) A good knowledge of physical database design
3) Ability to perform both Oracle and also operating system performance monitoring and the
necessary adjustments.
4) Be able to provide a strategic database direction for the organization.
5) Excellent knowledge of Oracle backup and recovery scenarios.
6) Good skills in all Oracle tools.
7) A good knowledge of Oracle security management.
8) A good knowledge of how Oracle acquires and manages resources.
9) Sound knowledge of the applications at your site.
10) Experience and knowledge in migrating code, database changes, data and
11) Menus through the various stages of the development life cycle.
12) A good knowledge of the way Oracle enforces data integrity.
13) A sound knowledge of both database and program code performance tuning.
14) A DBA should possess a sound understanding of the business.
15) A DBA should have sound communication skills with management, development teams,
vendors, systems administrators and other related service providers.

 DBA Responsibilities:
 Installation, configuration and upgrading of Database server software and related
products.
 Evaluate Database features and Database related products.
 Establish and maintain sound backup and recovery policies and procedures.
 Take care of the Database design and implementation.
 Implement and maintain database security (create and maintain users and roles, assign
privileges).
 Database tuning and performance monitoring.
 Application tuning and performance monitoring.
 Setup and maintain documentation and standards.
 Plan growth and changes (capacity planning).
 Work as part of a team and provide 24x7 support when required.
 Do general technical troubleshooting and give cons.
 Database recovery.

UNIT-3
 Database Model
A Database model defines the logical design of data. The model describes the relationships
between different parts of the data. Historically, in database design, three models are
commonly used. They are:

1) Hierarchical Model: In this model each entity has only one parent but can have several
children. At the top of hierarchy there is only one entity which is called Root.
 Advantages:
 Simplicity: Data naturally have hierarchical relationship in most of the practical
situations. Therefore, it is easier to view data arranged in manner. This makes this type
of database more suitable for the purpose.
 Security: These database system can enforce varying degree of security feature unlike
flat-file system.
 Database Integrity: Because of its inherent parent-child structure, database integrity is
highly promoted in these systems.
 Efficiency: The hierarchical database model is a very efficient, one when the database
contains a large number of I : N relationships (one-to-many relationships) and when the
users require large number of transactions, using data whose relationships are fixed.

 Disadvantages:
 Complexity of Implementation: The actual implementation of a hierarchical database
depends on the physical storage of data. This makes the implementation complicated.
 Difficulty in Management: The movement of a data segment from one location to
another cause all the accessing programs to be modified making database management
a complex affair.
 Complexity of Programming: Programming a hierarchical database is relatively complex
because the programmers must know the physical path of the data items.
 Operational Anomalies: As discussed earlier, hierarchical model suffers from the Insert
anomalies, Update anomalies and Deletion anomalies, also the retrieval operation is
complex and asymmetric, and thus hierarchical model is not suitable for all the cases.

2) Network Model: In the network model, entities are organized in a graph, in which some
entities can be accessed through several paths.

 Advantages:
 Conceptual simplicity: Just like the hierarchical model, the network model IS also
conceptually simple and easy to design.
 Capability to handle more relationship types: The network model can handle the one
to- many (l: N) and many to many (N: N) relationships, which is a real help in modeling
the real life situations.
 Ease of data access: The data access is easier and flexible than the hierarchical model.
 Data Integrity: The network model does not allow a member to exist without an
owner. Thus, a user must first define the owner record and then the member record.
This ensures the data integrity.
 Data independence: The network model is better than the hierarchical model in
isolating the programs from the complex physical storage details.

 Disadvantages:
 System complexity: All the records are maintained using pointers and hence the whole
database structure becomes very complex.
 Operational Anomalies: As discussed earlier, network model's insertion, deletion and
updating operations of any record require large number of pointer adjustments, which
makes its implementation very complex and complicated.
 Absence of structural independence: Since the data access method in the network
database model is a navigational system, making structural changes to the database is
very difficult in most cases and impossible in some cases. If changes are made to the
database structure then all the application programs need to be modified before they
can access data. Thus, even though the network database model succeeds in achieving
data independence, it still fails to achieve structural independence.

3) Relational Model: In this model, data is organized in two-dimensional tables


called relations. The tables or relation are related to each other.

 Advantages:
 Structural independence: In relational model, changes in the database structure do not
affect the data access. When it is possible to make change to the database structure
without affecting the DBMS's capability to access data, we can say that structural
independence has been achieved. So, relational database model has structural
independence.
 Conceptual simplicity: We have seen that both the hierarchical and the network
database model were conceptually simple. But the relational database model is even
simpler at the conceptual level. Since the relational data model frees the designer from
the physical data storage details, the designers can concentrate on the logical view of
the database.
 Design, implementation, maintenance and usage ease: The relational database model\
achieves both data independence and structure independence making the database
design, maintenance, administration and usage much easier than the other models.

 Disadvantages:
 Hardware overheads: Relational database system hides the implementation
complexities and the physical data storage details from the users. For doing this, i.e. for
making things easier for the users, the relational database systems need more powerful
hardware computers and data storage devices. So, the RDBMS needs powerful
machines to run smoothly. But, as the processing power of modem computers is
increasing at an exponential rate and in today's scenario, the need for more processing
power is no longer a very big issue.
 Ease of design can lead to bad design: The relational database is an easy to design and
use. The users need not know the complex details of physical data storage. They need
not know how the data is actually stored to access it. This ease of design and use can
lead to the development and implementation of very poorly designed database
management systems.
 'Information island' phenomenon: As we have said before, the relational database
systems are easy to implement and use. This will create a situation where too many
people or departments will create their own databases and applications. These
information islands will prevent the information integration that is essential for the
smooth and efficient functioning of the organization. These individual databases will
also create problems like data inconsistency, data duplication, data redundancy and so
on.

 Normalization of Database
Database Normalization is a technique of organizing the data in the database. Normalization is
a systematic approach of decomposing tables to eliminate data redundancy and undesirable
characteristics like Insertion, Update and Deletion Anomalies. It is a multi-step process that
puts data into tabular form by removing duplicated data from the relation tables.
Normalization is used for mainly two purposes,
 Eliminating redundant (useless) data.
 Ensuring data dependencies make sense i.e. data is logically stored.

 Problem without Normalization: Without Normalization, it becomes difficult to handle and


update the database, without facing data loss. Insertion, Updation and Deletion Anomalies
are very frequent if Database is not normalized. To understand these anomalies let us take
an example of Student table.

S_id S_Name S_Address Subject_opted


401 Adam Noida Bio
402 Alex Panipat Maths
403 Stuart Jammu Maths
404 Adam Noida Physics

 Updation Anamoly : To update address of a student who occurs twice or more than
twice in a table, we will have to update S_Address column in all the rows, else data will
become inconsistent.
 Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name
and address of a student but if student has not opted for any subjects yet then we have
to insert NULL there, leading to Insertion Anamoly.
 Deletion Anamoly: If (S_id) 401 has only one subject and temporarily he drops it, when
we delete that row, entire student record will be deleted along with it.

 Normalization Rule: Normalization rule are divided into following normal form.

1) First Normal Form (1NF): As per First Normal Form, no two Rows of data must contain
repeating group of information i.e. each set of column must have a unique value, such that
multiple columns cannot be used to fetch the same row. Each table should be organized
into rows, and each row should have a primary key that distinguishes it as unique.
The Primary key is usually a single column, but sometimes more than one column can be
combined to create a single primary key. For example consider a table which is not in First
normal form
Student Table:

Student Age Subject


Adam 15 Biology, Maths
Alex 14 Maths
Stuart 17 Maths

In First Normal Form, any row must not have a column in which more than one value is saved,
like separated with commas. Rather than that, we must separate such data into multiple rows.
Student Table following 1NF will be:

Student Age Subject


Adam 15 Biology
Adam 15 Maths
Alex 14 Maths
Stuart 17 Maths

Using the First Normal Form, data redundancy increases, as there will be many columns with
same data in multiple rows but each row as a whole will be unique.

2) Second Normal Form (2NF): As per the Second Normal Form there must not be any partial
dependency of any column on primary key. It means that for a table that has concatenated
primary key, each column in the table that is not part of the primary key must depend upon
the entire concatenated key for its existence. If any column depends only on one part of the
concatenated key, then the table fails Second normal form.

In example of First Normal Form there are two rows for Adam, to include multiple subjects
that he has opted for. While this is searchable, and follows First normal form, it is an inefficient
use of space. Also in the above Table in First Normal Form, while the candidate key is
{Student, Subject}, Age of Student only depends on Student column, which is incorrect as per
Second Normal Form. To achieve second normal form, it would be helpful to split out the
subjects into an independent table, and match them up using the student names as foreign
keys.
New Student Table following 2NF will be:

Student Age
Adam 15
Alex 14
Stuart 17
In Student Table the candidate key will be Student column, because all other column i.e. Age is
dependent on it.
New Subject Table introduced for 2NF will be:

Student Subject
Adam Biology
Adam Maths
Alex Maths
Stuart Maths

In Subject Table the candidate key will be {Student, Subject} column. Now, both the above
tables qualify for Second Normal Form and will never suffer from Update Anomalies. Although
there are a few complex cases in which table in Second Normal Form suffers Update Anomalies,
and to handle those scenarios Third Normal Form is there.

3) Third Normal Form (3NF): Third Normal form applies that every non-prime attribute of table
must be dependent on primary key, or we can say that, there should not be the case that a non-
prime attribute is determined by another non-prime attribute. So this transitive functional
dependency should be removed from the table and also the table must be in Second Normal form.
For example, consider a table with following fields.

Student_Detail Table:

Student_id Student_name DOB Street city State Zip

In this table Student_id is Primary key, but street, city and state depends upon Zip. The
dependency between zip and other fields is called transitive dependency. Hence to apply 3NF,
we need to move the street, city and state to new table, with Zip as primary key.

New Student_Detail Table:

Student_id Student_name DOB Zip

Address Table:

Zip Street city state

The advantage of removing transitive dependency is,


 Amount of data duplication is reduced.
 Data integrity achieved.

4) Boyce and Codd Normal Form (BCNF): is a higher version of the Third Normal form. This form
deals with certain type of Anamoly that is not handled by 3NF. A 3NF table which does not have
multiple overlapping candidate keys is said to be in BCNF. For a table to be in BCNF, following
conditions must be satisfied:

 R must be in 3rd Normal Form


 And, for each functional dependency (X -> Y), X should be a super Key.

You might also like