Unit 1
Unit 1
Data:
• Data is raw fact or figures or entity.
• When activities in the organization take place, the effect of these activitiesneeds to be recorded which is
known as Data.
Information:
Processed data is called information
The purpose of data processing is to generate the information required for carrying out the business
activities.
Database:
A database is basically a set of data and it contains interrelated data. The database contains set of
algorithm and rules through which data can be stored in it in a systematic manner. So, all the data fields
must be related to each other. For example, where it is used suppose an organization from attendance do
salary calculation and different allowances are given to them can be done through the Database. To keep a
record nowadays from billing an item, to take attendance in the classroom database is required.
Evolution of DBMS: -
A Database Management System (DBMS) is a collection of program that enables user to create and
maintain a database.
The DBMS is hence a general-purpose software system that facilitates the process of defining constructing
and manipulating database for various applications.
RDBMS:
A relational database management system is a database management system used to manage relational
databases. A relational database is one where tables of data can haverelationships based on primary and
foreign keys.
Advantages of DBMS:
Due to its centralized nature, the database system can overcome the disadvantages of the file system-
based system
● Concurrent use:A database system provides facility to access database several users concurrently. Let
we understand it by taking an example that a movie online booking system database employee of
different branches access the database in a concurrent manner. Each employee can handle customers
at their individual desk, Able to see the seats available for the booking from the interface provided to
them.
● Structured data:One of the important features of the database system is not only to store the data and
providing access to the database. But it also provides details about the data and how to access and use
it. Taking an example when you are going for an online form of exam then details about each and every
field is given, what type of data it accepts and format details also.
● Data Independence:Another important characteristic of the DBMS is data independence in which
application through which user can interact with the user is not dependent on the physical data
storage. So, if there is any change in the application program it will not affect the data stored in the
database.
● Integrity: This characteristic of the DBMS deals with one of the basic property of the data called
integrity, in which if some data is saved in the database and later on retrieved from the database it
must be same. This also covers restriction on the unauthorized access of the data that can make
changes in the data sets.
● Transaction of Data:A transaction is a set of actions that are done in a database to transfer it from one
consistent state to another. If it is not handled properly it may result in inconsistent state and loss of
data, so DBMS has certain constraints to maintain the basic properties of transaction like atomicity,
integrity, isolation, and durability.
● Data Persistence: This is one of the basic characteristics of the database because data can be retained
in the database for years and it should be in the same condition. In the banking system, a user will open
his account and lifelong maintain it, or a user has opted an LIC policy, or mediclaim policy.
Real-world entity: A modern DBMS is more realistic and uses real-world entities to design its architecture.
It uses the behavior and attributes too. For example, a school database may use students as an entity and
their age as an attribute.
Relation-based tables: DBMS allows entities and relations among them to form tables. A user can
understand the architecture of a database just by looking at the table names.
Isolation of data and application: A database system is entirely different than its data. A database is an
active entity, whereas data is said to be passive, on which the database works and organizes. DBMS also
stores metadata, which is data about data, to ease its own process.
Less redundancy: DBMS follows the rules of normalization, which splits a relation when any of its
attributes is having redundancy in values. Normalization is a mathematically rich and scientific process that
reduces data redundancy.
Consistency: Consistency is a state where every relation in a database remains consistent. There exist
methods and techniques, which can detect attempt of leaving database in inconsistent state.
Query Language: DBMS is equipped with query language, which makes it more efficient to retrieve and
manipulate data. A user can apply as many and as different filtering options as required to retrieve a set of
data. Traditionally it was not possible where file-processing system was used.
ACID Properties: DBMS follows the concepts of Atomicity, Consistency, Isolation, and Durability (normally
shortened as ACID). These concepts are applied on transactions, which manipulate data in a database.
Multiuser and Concurrent Access: DBMS supports multi-user environment and allows them to access and
manipulate data in parallel. Though there are restrictions on transactions when users attempt to handle
the same data item, but users are always unaware of them.
Multiple views: DBMS offers multiple views for different users. A user who is in the Sales department will
have a different view of database than a person working in the Production department. This feature
enables the users to have a concentrate view of the database according to their requirements.
Security: Features like multiple views offer security to some extent where users are unable to access data
of other users and departments. DBMS offers methods to impose constraints while entering data into the
database and retrieving the same at a later stage.
DBMS Architecture:
The design of a DBMS depends on its architecture. It can be centralized or decentralized or hierarchical.
The architecture of a DBMS can be seen as either single tier or multi-tier. An n-tier architecture divides the
whole system into related but independent n modules, which can be independently modified, altered,
changed, or replaced.
In 1-tier architecture, the DBMS is the only entity where the user directly sits on the DBMS and uses it. Any
changes done here will directly be done on the DBMS itself. It does not provide handy tools for end-users.
Database designers and programmers normally prefer to use single-tier architecture.
If the architecture of DBMS is 2-tier, then it must have an application through which the DBMS can be
accessed. Programmers use 2-tier architecture where they access the DBMS by means of an application.
Here the application tier is entirely independent of the database in terms of operation, design, and
programming.
3-tier Architecture:
Database (Data) Tier: At this tier, the database resides along with its query processing languages. We also
have the relations that define the data and their constraints at this level.
Application (Middle) Tier: At this tier reside the application server and the programs that access the
database. For a user, this application tier presents an abstracted view of the database. End-users are
unaware of any existence of the database beyond the application. At the other end, the database tier is
not aware of any other user beyond the application tier. Hence, the application layer sits in the middle and
acts as a mediator between the end-user and the database.
User (Presentation) Tier: End-users operate on this tier and they know nothing about any existence of the
database beyond this layer. At this layer, multiple views of the database can be provided by the
application. All views are generated by applications that reside in the application tier.
Users are differentiated by the way they expect to interact with the system:
Application programmers:
Application programmers are computer professionals who write application programs. Application
programmers can choose from many tools to develop user interfaces.
Rapid application development (RAD) tools are tools that enable an application programmer to construct
forms and reports without writing a program.
Sophisticated users:
Sophisticated users interact with the system without writing programs. Instead, they form their requests in
a database query language.
Specialized users:
Specialized users are sophisticated users who write specialized database applications that do not fit into
the traditional data-processing framework.
Naïve users:
Naive users are unsophisticated users who interact with the system by invoking one of the application
programs that have been written previously.
Database Administrator:
Coordinates all the activities of the database system. The database administrator has a good understanding
of the enterprise’s information resources and needs.
Query Processor:
The query processor will accept query from user and solves it by accessing the database.
Parts of Query processor:
DDL interpreter
This will interpret DDL statements and fetch the definitions in the data dictionary.
DML compiler
a. This will translates DML statements in a query language into low level instructions that the query
evaluation engine understands.
b. A query can usually be translated into any of a number of alternative evaluation plans for same query
result DML compiler will select best plan for query optimization.
Query evaluation engine
This engine will execute low-level instructions generated by the DML compiler on DBMS.
Storage Manager/Storage Management:
A storage manager is a program module which acts like interface between the data stored in a database
and the application programs and queries submitted to the system.
The storage manager components include:
Authorization and integrity manager: Checks for integrity constraints and authority of users to access
data.
Transaction manager: Ensures that the database remains in a consistent state although there are system
failures.
File manager: Manages the allocation of space on disk storage and the data structures used to represent
information stored on disk.
Buffer manager: It is responsible for retrieving data from disk storage into main memory. It enables the
database to handle data sizes that are much larger than the size of main memory.
Data structures implemented by storage manager.
Data files: Stored in the database itself.
Data dictionary: Stores metadata about the structure of the database.
Indices: Provide fast access to data items.
Data Models:
Data models define how the logical structure of a database is modelled. Data Models are fundamental
entities to introduce abstraction in a DBMS. Data models define how data is connected to each other and
how they are processed and stored inside the system.
The very first data model could be flat datamodels, where all the data used are to be kept in the same
plane. Earlier data models were not so scientific; hence they were prone to introduce lots of duplication
and update anomalies.
There are many kinds of data models. Some of the most common ones include:
● Hierarchical database model
● Relational model
● Network model
● Object-oriented database model
● Entity-relationship model
● Object-relational model
The most common model, the relational model stores data into tables, also known as relations, each of
which consists of columns and rows. Each column lists an attribute of the entity in question, such as price,
zip code, or birth date. Together, the attributes in a relation are called a domain. A particular attribute or
combination of attributes is chosen as a primary key that can be referred to in other tables, when it’s called
a foreign key.
Each row, also called a tuple, includes data about a specific instance of the entity in question, such as a
particular employee.
The model also accounts for the types of relationships between those tables, including one-to-one, one-to-
many, and many-to-many relationships. Here’s an example:
Relational Model:
Within the database, tables can be normalized, or brought to comply with normalization rules that make
the database flexible, adaptable, and scalable. When normalized, each piece of data is atomic, or broken
into the smallest useful pieces.
Relational databases are typically written in Structured Query Language (SQL). The model was introduced
by E.F. Codd in 1970.
Hierarchical Model:
The hierarchical model organizes data into a tree-like structure, where each record has a single parent or
root. Sibling records are sorted in a particular order. That order is used as the physical order for storing the
database. This model is good for describing many real-world relationships.
Network Model:
The network model builds on the hierarchical model by allowing many-to-many relationships between
linked records, implying multiple parent records. Based on mathematical set theory, the model is
constructed with sets of related records. Each set consists of one owner or parent record and one or more
member or child records. A record can be a member or child in multiple sets, allowing this model to convey
complex relationships.
Object-Relational Model:
This hybrid database model combines the simplicity of the relational model with some of the advanced
functionality of the object-oriented database model. In essence, it allows designers to incorporate objects
into the familiar table structure.
Languages and call interfaces include SQL3, vendor languages, ODBC, JDBC, and proprietary call interfaces
that are extensions of the languages and interfaces used by the relational model.
Entity-Relationship Model:
This model captures the relationships between real-world entities much like the network model, but it isn’t
as directly tied to the physical structure of the database. Instead, it’s often used for designing a database
conceptually.
Here, the people, places, and things about which data points are stored are referred to as entities, each of
which has certain attributes that together make up their domain. The cardinality, or relationships between
entities, are mapped as well.
Figure 1.8: E-R Diagram
Instances: The data stored in database at a particular moment of time is called instance of database.
Database schema defines the variable declarations in tables that belong to a particular database the value
of these variables at a moment of time is called the instance of that database.
Data Independence:
A database system normally contains a lot of data in addition to users’ data. For example, it stores data
about data, known as metadata, to locate and retrieve data easily. It is rather difficult to modify or update
a set of metadata once it is stored in the database. But as a DBMS expands, it needs to change over time to
satisfy the requirements of the users.
Logical data independence is a kind of mechanism, which liberalizes itself from actual data stored on the
disk. If we do some changes on table format, it should not change the data residing on the disk.
For example, in case we want to change or upgrade the storage system itself − suppose we want to replace
hard-disks with SSD − it should not have any impact on the logical data or schemas. Requirements of the
users. If the entire data is dependent, it would become a tedious and highly complex job.
A data administration (also known as a database administration manager, data architect, or information
center manager) is a high-level function responsible for the overall management of data resources in an
organization. In order to perform its duties, the DA must know a good deal of system analysis and
programming.
These are the functions of a data administrator (not to be confused with database administrator
functions):
E-R Diagram
Figure 1.11: E-R Diagram
ER Model is represented by means of an ER diagram. Any object, for example, entities, attributes of an
entity, relationship sets, and attributes of relationship sets, can be represented with the help of an ER
diagram.
Entity:
Entities are represented by means of rectangles. Rectangles are named with the entity set they represent.
EX. Student, Faculty, Course
Strong Entity: - This type of entities has key attributes set that are used create a primary key. Like
student entity has Roll no as a key attribute.
Weak Entity: - This type of Entities does not have any primary key attribute and they are dependent on
some other entity have the prime key attribute. For example, Employee child is weak entity some other
entity has the prime key attribute. For example, Employee child is a weak entity which is dependent on
Employee Entity.
Composite Entity: - This type of entities is used to replace many to many relationships and that
relationship is replaced by Entity. It is of three type one-to-one cardinality, one-to-many cardinality,
many-to-many cardinality.
Attributes:
Attributes are the properties of entities. Attributes are represented by means of ellipses. Every ellipse
represents one attribute and is directly connected to its entity (rectangle).
Derived attributes: - This is the attributes that can be derived from other attributes. Like Age can be
derived from present date and Date of Birth.
Composite Attribute: - This is the attributes that can be created with the combination of two other
attributes. Like a combination of first name and Last name will give the full name.
Single-value attribute: - This is the attributes which have single value when they are converted in the form
of a table. Like DOB of a person.
Multi-Valued Attribute: - This is the attributes which have multiple values for the instance When an Entity
is converted to a table and that attribute allows multiple values for it. Like a person have two contact
numbers.
Relationship:
A Relationship describes relations between entities. Relationship is represented using diamonds.
There are three types of relationship that exist between Entities.
● Binary Relationship
● Recursive Relationship
● Ternary Relationship
Binary Relationship
Binary Relationship means relation between two Entities. This is further divided into three types.
One to One: This type of relationship is rarely seen in real world.
The above example describes that one student can enroll only for one course and a course will also have
only one Student. This is not what you will usually see in relationship.
One to Many: It reflects business rule that one entity is associated with many number of same
entity.
The example for this relation might sound a little weird, but this means that one student can enroll to
many courses, but one course will have one Student.
The arrows in the diagram describes that one student can enroll for only one course.
Many to One: It reflects business rule that many entities can be associated with just one entity. For
example, Student enrolls for only one Course but a Course can have many Students.
Many to Many:
The above diagram represents that many students can enroll for more than one courses.
Recursive Relationship
When an Entity is related with itself it is known as Recursive Relationship.
Ternary Relationship
Relationship of degree three is called Ternary relationship.