What Is Database
What Is Database
The database is a collection of inter-related data which is used to retrieve, insert and
delete the data efficiently. It is also used to organize the data in the form of a table,
schema, views, and reports, etc.
For example: The college Database organizes the data about the admin, staff, students
and faculty etc.
Using the database, you can easily retrieve, insert, and delete the information.
Sharing of data Due to the centralized approach, data Data is distributed in many files,
sharing is easy. and it may be of different formats,
so it isn't easy to share data.
Data Abstraction DBMS gives an abstract view of data The file system provides the detail
that hides the details. of the data representation and
storage of data.
Security and DBMS provides a good protection It isn't easy to protect a file under
Protection mechanism. the file system.
Recovery DBMS provides a crash recovery The file system doesn't have a
Mechanism mechanism, i.e., DBMS protects the crash mechanism, i.e., if the
user from system failure. system crashes while entering
some data, then the content of the
file will be lost.
Manipulation DBMS contains a wide variety of The file system can't efficiently
Techniques sophisticated techniques to store and store and retrieve the data.
retrieve the data.
Where to use Database approach used in large File system approach used in large
systems which interrelate many files. systems which interrelate many
files.
Data Due to the centralization of the In this, the files and application
Redundancy and database, the problems of data programs are created by different
Inconsistency redundancy and inconsistency are programmers so that there exists a
controlled. lot of duplication of data which
may lead to inconsistency.
Structure The database structure is complex to The file system approach has a
design. simple structure.
Data In this system, Data Independence In the File system approach, there
Independence exists, and it can be of two types. exists no Data Independence.
o Logical Data Independence
o Physical Data Independence
Data Models In the database approach, 3 types of In the file system approach, there
data models exist: is no concept of data models
exists.
o Hierarchal data models
o Network data models
o Relational data models
Flexibility Changes are often a necessity to the The flexibility of the system is less
content of the data stored in any as compared to the DBMS
system, and these changes are more approach.
easily with a database approach.
DBMS Architecture
o The DBMS design depends upon its architecture. The basic client/server
architecture is used to deal with a large number of PCs, web servers, database
servers and other components that are connected with networks.
o The client/server architecture consists of many PCs and a workstation which are
connected via the network.
o DBMS architecture depends upon how users are connected to the database to
get their request done.
Database architecture can be seen as a single tier or multi-tier. But logically, database
architecture is of two types like: 2-tier architecture and 3-tier architecture.
1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the
user can directly sit on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.
o The 1-Tier architecture is used for development of the local application, where
programmers can directly communicate with the database for the quick
response.
2-Tier Architecture
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In
this architecture, client can't directly communicate with the server.
o The application on the client-end interacts with an application server which
further communicates with the database system.
o End user has no idea about the existence of the database beyond the application
server. The database also has no idea about any other user beyond the
application.
o The 3-Tier architecture is used in case of large web application.
Fig: 3-tier Architecture
Physical Level
This is the lowest level in the three level architecture. It is also known as the internal
level. The physical level describes how data is actually stored in the database. In the
lowest level, this data is stored in the external hard drives in the form of bits and at a
little high level, it can be said that the data is stored in files and folders. The physical
level also discusses compression and encryption techniques.
Conceptual Level
The conceptual level is at a higher level than the physical level. It is also known as the
logical level. It describes how the database appears to the users conceptually and the
relationships between various data tables. The conceptual level does not care for how
the data in the database is actually stored.
External Level
This is the highest level in the three level architecture and closest to the user. It is also
known as the view level. The external level only shows the relevant database content to
the users in the form of views and hides the rest of the data. So different users can see
the database as a different view as per their individual requirements.
In the chart above, we have different objects linked to one another using methods; one
can get the address of the Person (represented by the Person Object) using the lives
At() method. Furthermore, these objects have attributes which are in fact the data
elements that need to be defined in the database.
An example of such a model is the Berkeley DB software library which uses the same
conceptual background to deliver quick and highly efficient responses to database
queries from the embedded database.
4. Relational databases :
Considered the most mature of all databases, these databases lead in the production
line along with their management systems. In this database, every piece of information
has a relationship with every other piece of information. This is on account of every data
value in the database having a unique identity in the form of a record.
Note that all data is tabulated in this model. Therefore, every row of data in the
database is linked with another row using a primary key. Similarly, every table is linked
with another table using a foreign key.
Refer to the diagram below and notice how the concept of ‘Keys’ is used to link two
tables.
Due to this introduction of tables to organize data, it has become exceedingly popular.
In consequence, they are widely integrated into Web-Ap interfaces to serve as ideal
repositories for user data. What makes it further interesting is the ease in mastering it,
since the language used to interact with the database is simple (SQL in this case) and
easy to comprehend.
It is also worth being aware of the fact that in Relational databases, scaling and
traversing through data is quite a light-weighted task in comparison to Hierarchical
Databases.
5. NoSQL Databases :
A NoSQL originally referring to non SQL or non-relational is a database that provides a
mechanism for storage and retrieval of data. This data is modeled in means other than
the tabular relations used in relational databases.
A NoSQL database includes simplicity of design, simpler horizontal scaling to clusters of
machines, and finer control over availability. The data structures used by NoSQL
databases are different from those used by default in relational databases which makes
some operations faster in NoSQL. The suitability of a given NoSQL database depends
on the problem it should solve. Data structures used by NoSQL databases are
sometimes also viewed as more flexible than relational database tables.
MongoDB falls in the category of NoSQL document-based database.
Advantages of NoSQL –
There are many advantages of working with NoSQL databases such as MongoDB and
Cassandra. The main advantages are high scalability and high availability.
Disadvantages of NoSQL –
NoSQL has the following disadvantages.
NoSQL is an open-source database.
GUI is not available
Backup is a weak point for some NoSQL databases like MongoDB.
Large document size.
These are but a few types of database structures which represent the fundamental
concepts extensively used in the industry. However, as mentioned earlier, clients tend
to focus on creating databases which would suit their own needs; to store data in a
schema which showcases a variable functionality based on its blueprint. Hence, the
scope for development in reference to databases and database management systems
is bright.
Data Model is the modeling of the data description, data semantics, and consistency
constraints of the data. It provides the conceptual tools for describing the design of a
database at each level of data abstraction. Therefore, there are following four data
models used for understanding the structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and
columns within a table. Thus, a relational model uses tables for representing data and
in-between relationships. Tables are also called relations. This model was initially
described by Edgar F. Codd, in 1969. The relational data model is the widely used
model which is primarily used by commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation of data as
objects and relationships among them. These objects are known as entities, and
relationship is an association among these entities. This model was designed by Peter
Chen and published in 1976 papers. It was widely used in database designing. A set of
attributes describe the entities. For example, student_name, student_id describes the
'student' entity. A set of the same type of entities is known as an 'Entity set', and the set
of the same type of relationships is known as 'relationship set'.
4) Semistructured Data Model: This type of data model is different from the other three
data models (explained above). The semistructured data model allows the data
specifications at places where the individual data items of the same type may have
different attributes sets. The Extensible Markup Language, also known as XML, is
widely used for representing the semi structured data. Although XML was initially
designed for including the markup information to the text document, it gains importance
because of its application in the exchange of data.
Keys
For example, ID is used as a key in the Student table because it is unique for each
student. In the PERSON table, passport_number, license_number, SSN are keys since
they are unique for each person.
Types of keys:
1. Primary key
o It is the first key used to identify one and only one instance of an entity uniquely.
An entity can contain multiple keys, as we saw in the PERSON table. The key
which is most suitable from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since it is unique for each
employee. In the EMPLOYEE table, we can even select License_Number and
Passport_Number as primary keys since they are also unique.
o For each entity, the primary key selection is based on requirements and
developers.
2. Candidate key
For example: In the EMPLOYEE table, id is best suited for the primary key. The rest of
the attributes, like SSN, Passport_Number, License_Number, etc., are considered a
candidate key.
3. Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a superset
of a candidate key.
For example: In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME),
the name of two employees can be the same, but their EMPLYEE_ID can't be the
same. Hence, this combination can also be a key.
4. Foreign key
o Foreign keys are the column of the table used to point to the primary key of
another table.
o Every employee works in a specific department in a company, and employee and
department are two different entities. So we can't store the department's
information in the employee table. That's why we link these two tables through
the primary key of one table.
o We add the primary key of the DEPARTMENT table, Department_Id, as a new
attribute in the EMPLOYEE table.
o In the EMPLOYEE table, Department_Id is the foreign key, and both the tables
are related.
5. Alternate key
There may be one or more attributes or a combination of attributes that uniquely identify
each tuple in a relation. These attributes or combinations of the attributes are called the
candidate keys. One key is chosen as the primary key from these candidate keys, and
the remaining candidate key, if it exists, is termed the alternate key. In other words, the
total number of the alternate keys is the total number of candidate keys minus the
primary key. The alternate key may or may not exist. If there is only one candidate key
in a relation, it does not have an alternate key.
For example, employee relation has two attributes, Employee_Id and PAN_No, that act
as candidate keys. In this relation, Employee_Id is chosen as the primary key, so the
other candidate key, PAN_No, acts as the Alternate key.
6. Composite key
Whenever a primary key consists of more than one attribute, it is known as a composite
key. This key is also known as Concatenated Key.
The key created using arbitrarily assigned data are known as artificial keys. These keys
are created when a primary key is large and complex and has no relationship with many
other relations. The data values of the artificial keys are usually numbered in a serial
order.
For example, the primary key, which is composed of Emp_ID, Emp_role, and Proj_ID, is
large in employee relations. So it would be better to add a new virtual attribute to
identify each tuple in the relation uniquely.
Types of constraints
NOT NULL
UNIQUE
DEFAULT
CHECK
Key Constraints – PRIMARY KEY, FOREIGN KEY
Domain constraints
Mapping constraints
NOT NULL:
NOT NULL constraint makes sure that a column does not hold NULL value. When we
don’t provide value for a particular column while inserting a record into a table, it takes
NULL value by default. By specifying NULL constraint, we can be sure that a particular
column(s) cannot have NULL values.
UNIQUE:
DEFAULT:
The DEFAULT constraint provides a default value to a column when there is no value
provided while inserting a record into a table.
CHECK:
This constraint is used for specifying range of values for a particular column of a table.
When this constraint is being set on a column, it ensures that the specified column must
have the value falling in the specified range.
In the above example we have set the check constraint on ROLL_NO column of
STUDENT table. Now, the ROLL_NO field must have the value greater than 1000.
Key constraints:
PRIMARY KEY:
Primary key uniquely identifies each record in a table. It must have unique values and
cannot contain nulls. In the below example the ROLL_NO field is marked as primary
key, that means the ROLL_NO field cannot have duplicate and null values.
FOREIGN KEY:
Foreign keys are the columns of a table that points to the primary key of another table.
They act as a cross-reference between tables.
Read more about it here.
Domain constraints:
Each table has certain set of columns and each column allows a same type of data,
based on its data type. The column does not accept values of any other data type.
Domain constraints are user defined data type and we can define them like this:
Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY /
FOREIGN KEY / CHECK / DEFAULT)