0% found this document useful (0 votes)
3 views

Unit-5 DBMS

The document discusses the limitations of conventional databases, including high costs, complexity, and the risk of database failure. It also introduces emerging database types such as multimedia, temporal, spatial, and cloud databases, highlighting their characteristics, applications, and challenges. Additionally, it provides an overview of Google Bigtable, emphasizing its scalability and use in handling large amounts of structured data.

Uploaded by

Bmt King
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit-5 DBMS

The document discusses the limitations of conventional databases, including high costs, complexity, and the risk of database failure. It also introduces emerging database types such as multimedia, temporal, spatial, and cloud databases, highlighting their characteristics, applications, and challenges. Additionally, it provides an overview of Google Bigtable, emphasizing its scalability and use in handling large amounts of structured data.

Uploaded by

Bmt King
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Unit-5

Emerging Database and case studies

Limitations of Conventional Database

Increased Cost

Hardware and Software Costs: It is the main disadvantage of a DBMS system, a


database management system database requires processing power which needs
high-speed processors and these processors use hardware that is expensive and
increases the cost of the overall system. Database management systems need a lot
of storage and expensive software for storing data and this storage memory needs
to be fast for faster output. Hence, the storage adds to overall costs. Hence, there
are a lot of costs required for setting up and maintaining a database management
system.

Staff Training and Expense: A huge amount of cost is also required for training and
educating staff that maintains the database, Hiring new staff and giving them also
increases the overall expense.

Cost of Data Conversion: we have to convert all our data into a database
management system, and for that skilled database designers are required for
designing the entire database, Hence a large amount of money is required for their
salaries and the software required to design the database. all these add ups to
increased costs.

Complexity

The database management system is very complex to use and normal people cannot
understand how to use its software before proper training. So, for proper design of
the database and management of the database skilled engineers, developers, and
database administrators are required. The database structure can also be Complex
and if it is designed or mapped in the wrong way it can lead to data loss or wrong
management of data which could affect the organization’s data, As it is a complex
task to maintain data in the database management system, it requires a lot of
manpower, staff, and software needed to do so.

Database Failure

Database Failure is one of the biggest disadvantages of a database management


system. it requires a lot of maintenance and constant power. Data stored on DBMS is
centralized in nature, if the database server fails, the whole system will fail and the
organization will be affected.

Performance
The database management system works very fast when the data is less to work on,
But as the data of the organization grows, the system becomes heavier and heavier
and the performance of a DBMS decreases, so sometimes the file management
system is preferred over the database management system.

Frequent Updates/Upgrades

Nowadays DBMS software is regularly been updated by DBMS vendors, so updating


the software also increases the need of updating the hardware used for that
software. which increases the overall unnecessary expenses to the organization.
introducing new updates in the software also brings up new commands to operate
the DBMS system. Hence, staff maintaining the database have to be trained
according to the updates which is a lot of hassle.

Huge Size

As the data acquired by the organization increases, more storage space is needed to
set up. But increasing the storage space makes the database heavier, so searching
and storing of data becomes slow and DBMS software takes more time to fetch
queries which makes it inefficient.

Multimedia Database
Multimedia data is an interactive way to represent information to a user. It
includes several categories of data like textual data, audio data, video data, etc.
The database which is used to hold these different kinds of multimedia data is
known as a multimedia database.
Nowadays, we as a user take the help of various forms of media such as text, images,
audio, video, and graphic objects for communication or to gain any kind of
information. These media forms are collectively known as
multimedia. Multimedia provides us with an interactive way to display information
to a user. Hence, managing and storing these different kinds of multimedia data is
essential. This is done using a database known as a multimedia database.
A Multimedia database is a special type of database that helps us to organize, query,
and store inter-related multimedia data. It facilitates the storage and retrieval of
multimedia data elements. In these databases, all the media files are stored in the
form of binary strings and are encoded according to their file types. Let's look at
different types of multimedia databases.

Types of Multimedia Database

Based on the type of multimedia data it stores, the multimedia database is


categorized into three types:
 Static media: These multimedia datasets are specifically used for static media
objects, i.e., those objects which are independent of time constraints, such as
images and graphic objects.
 Dynamic media: These datasets are used to store dynamic forms of media
content, i.e., those multimedia data elements which are time-dependent, like
audio data, video data, and animations.
 Dimensional media: Dimensional multimedia datasets are typically used in
Computer-Aided Drafting programs. These operate on 3D multimedia data
and include various formats used by image and video editing applications.

Content of Multimedia Database

To effectively manage and query a large collection of multimedia data, multimedia


databases also store additional information related to the data apart from the
primary multimedia data. The contents of a multimedia database are:

 Media data: It is the actual multimedia data or the primary data stored in the
multimedia database. It represents a multimedia object and can be an image,
audio, video, animation, graphic object, or text.
 Media format data: It is the information related to the format of the
multimedia data. It contains data such as frame rates, encoding schemes, etc.
 Media keyword data: It is also knowns as content descriptive data and
contains information related to the generation of multimedia data like date
and time in the case of images and videos, etc.
 Media feature data: It is used to describe the features of multimedia data,
such as the distribution of colors, etc.

Applications of Multimedia Database

As discussed, there are many applications of a multimedia database. Some of the


main applications include:

 Documents and record management: Multimedia databases are used in


industries that require a large set of documentation and records, like the
insurance claim industry, etc.
 Education: As multimedia data provides an interactive way to represent data,
a multimedia database can act as an effective knowledge dissemination tool.
These applications include the use of multimedia datasets in digital libraries
and computer-aided learning software.
 Marketing and Entertainment: A Multimedia database can act as a data
provider for entertainment applications like video-on-demand apps,
news-on-demand apps, etc. It can provide multimedia data for
advertisements and digital marketing processes.
 Real-time Monitoring: Combining various software tools with a multimedia
database can allow us to monitor and manage multimedia data in real time.
For example, a geographic information system (GIS) makes use of multimedia
databases to analyze and visualize geographical multimedia data in real time.
Temporal Databases

A temporal database is a database that needs some aspect of time for the
organization of information. In the temporal database, each tuple in relation is
associated with time. It stores information about the states of the real world and
time. The temporal database does store information about past states it only
stores information about current states. Whenever the state of the database
changes, the information in the database gets updated. In many fields, it is very
necessary to store information about past states. For example, a stock database
must store information about past stock prizes for analysis. Historical information
can be stored manually in the schema.
There are various terminologies in the temporal database:
 Valid Time: The valid time is a time in which the facts are true with respect
to the real world.
 Transaction Time: The transaction time of the database is the time at which
the fact is currently present in the database.
 Decision Time: Decision time in the temporal database is the time at which
the decision is made about the fact.
Temporal databases use a relational database for support. But relational databases
have some problems in temporal database, i.e. it does not provide support for
complex operations. Query operations also provide poor support for performing
temporal queries.
Applications of Temporal Databases
Finance: It is used to maintain the stock price histories.
1. It can be used in Factory Monitoring System for storing information about
current and past readings of sensors in the factory.
2. Healthcare: The histories of the patient need to be maintained for giving
the right treatment.
3. Banking: For maintaining the credit histories of the user.

Examples of Temporal Databases

1. An EMPLOYEE table consists of a Department table that the employee is


assigned to. If an employee is transferred to another department at some
point in time, this can be tracked if the EMPLOYEE table is an application
time-period table that assigns the appropriate time periods to each department
he/she works for.

Temporal Relation

A temporal relation is defined as a relation in which each tuple in a table of the


database is associated with time, the time can be either transaction time or valid
time.
Types of Temporal Relation

There are mainly three types of temporal relations:


1. Uni-Temporal Relation: The relation which is associated with valid or
transaction time is called Uni-Temporal relation. It is related to only one time.
2. Bi-Temporal Relation: The relation which is associated with both valid time and
transaction time is called a Bi-Temporal relation. Valid time has two parts namely
start time and end time, similar in the case of transaction time.
3. Tri-Temporal Relation: The relation which is associated with three aspects of
time namely Valid time, Transaction time, and Decision time called as Tri-Temporal
relation.

Features of Temporal Databases

 The temporal database provides built-in support for the time dimension.
 Temporal database stores data related to the time aspects.
 A temporal database contains Historical data instead of current data.
 It provides a uniform way to deal with historical data.

Challenges of Temporal Databases

1. Data Storage: In temporal databases, each version of the data needs to be


stored separately. As a result, storing the data in temporal databases requires
more storage as compared to storing data in non-temporal
databases.

2. Schema Design: The temporal database schema must accommodate


the time dimension. Creating such a schema is more difficult than creating a
schema for non-temporal
databases.

3. Query Processing: Processing the query in temporal databases


is slower than processing the query in non-temporal databases due to the
additional complexity of managing temporal data.

Spatial Databases
A spatial database is a database that is enhanced to store and access spatial data or
data that defines a geometric space. These data are often associated with
geographic locations and features or constructed features like cities. Data on spatial
databases are stored as coordinates, points, lines, polygons, and topology. Some
spatial databases handle more complex data like three-dimensional objects,
topological coverage, and linear networks.
Spatial data is associated with geographic locations such as cities,towns etc. A
spatial database is optimized to store and query data representing objects. These
are the objects which are defined in a geometric space.

Characteristics of Spatial Database

A spatial database system has the following characteristics

 It is a database system
 It offers spatial data types (SDTs) in its data model and query language.
 It supports spatial data types in its implementation, providing at least spatial
indexing and efficient algorithms for spatial join.

Example

A road map is a visualization of geographic information. A road map is a


2-dimensional object which contains points, lines, and polygons that can represent
cities, roads, and political boundaries such as states or provinces.
In general, spatial data can be of two types −

 Vector data: This data is represented as discrete points, lines and polygons
 Rastor data: This data is represented as a matrix of square cells.

The spatial data in the form of points, lines, polygons etc. is used by many different
databases.

Spatial data is diverse. Over the years, spatial data has grown. Now, spatial data
covers everything from simple vector data (points lines, or polygons) to imagery,
complex 3D scenes, and even indoor locations. Representing real-world objects with
accuracy or performing analysis can be quite complex. This is why we need spatial
databases (also known as geospatial databases).

Spatial databases are built to store and provide powerful query capabilities for
spatial data. Spatial data is often much larger in size than traditional data because of
its additional locational component. Spatial databases make the storage of complex
spatial data possible. Traditional database management systems are not capable of
storing, querying, and indexing spatial data.

You can find spatial databases supported natively through a database (i.e. Microsoft
SQL Server), or as an extension to an existing database (i.e. the ever-popular and
powerful PostGIS extension for PostgreSQL).
How do Spatial Databases differ from each other?

When it comes to comparing spatial databases, we can look at three primary


features:

 Spatial data types


 Spatial queries
 Spatial indexes

Together, these three components comprise the basis of a spatial database. These
three components will help you decide which spatial database is most suitable for
your enterprise or business.

Spatial Data Type

Spatial data comes in all shapes and sizes. All databases typically support points,
lines, and polygons, but some support many more spatial data types. Some
databases abide by the standards set by the Open Geospatial Consortium. Yet, that
doesn’t mean it is easy to move the data between databases.

This is where the FME platform reveals some of its strengths. Database barriers no
longer matter, as you can move your data wherever you want. With support for over
450 different systems and applications, it can handle all your data tasks, spatial and
otherwise.
FME platform supports over 450 different systems and applications

Spatial Queries

Spatial queries perform an action on spatial data stored in the database. Some
spatial queries can be used to perform simple operations. However, some queries
can become much more complex, invoking spatial functions that span multiple tables.
A spatial query using SQL allows you to retrieve a specific subset of spatial data. This
helps you retrieve only what you need from your database.

This is how data is retrieved in spatial databases. The spatial query capabilities can
vary from database to database, both in terms of performance and functionality.
This is important to consider when you select your database.

Spatial queries drive a whole new class of business decisions retrieving requested
data efficiently for your business systems.

Spatial Indexes

What does the added size and complexity of spatial data mean for your data? Will
your database run slower? Will large spatial databases be too bulky for your
database to store?

This is why spatial indexes are important. Spatial indexes are created with SQL
commands. These are generated from the database management interface or
external program (i.e FME) with access to your spatial database. Spatial indexes vary
from database to database and are responsible for the database performance
necessary for adding spatial to your decision making.

Cloud Databases
A cloud database is a database that is deployed in a cloud environment as opposed
to an on-premise environment. The database itself can be offered as
a SaaS (Software-as-a-Service) application or simply be hosted in a cloud-based
virtual machine. Applications can then access all the data stored in a cloud database
over a network from any device.

With a cloud database, there is no need for dedicated hardware to host a database.
Rather than the organization itself installing, configuring, and maintaining a database
instance or instances, the cloud provider can provision, manage, and scale the
underlying database cluster.
You can deploy any type of database in the cloud. This includes traditional SQL
databases and more modern NoSQL types of databases. MongoDB Atlas is a
general-purpose document database that can be deployed on any of the major cloud
providers, like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud.

When to use a cloud database

Cloud databases work in most cases that traditional databases do. They are
particularly valuable when building software products that:

 require a large volume of data.


 are cloud-native.
 need to handle high scale traffic.
 are distributed geographically.

Fully-managed databases are now increasingly used for real-time transaction


processing, legacy database migration, mobile application development, Internet of
Things, caching, and analytics.

Google Big Table

Bigtable is ideal for storing large amounts of single-keyed data with low latency. It
supports high read and write throughput at low latency, and it's an ideal data source
for MapReduce operations.

Google Bigtable is a distributed, column-oriented data store created by Google Inc.


to handle very large amounts of structured data associated with the company's
Internet search and Web services operations.

Bigtable was designed to support applications requiring massive scalability; from its
first iteration, the technology was intended to be used with petabytes of data. The
database was designed to be deployed on clustered systems and uses a simple data
model that Google has described as "a sparse, distributed, persistent
multidimensional sorted map." Data is assembled in order by row key, and indexing
of the map is arranged according to row, column keys and
timestamps. Compression algorithms help achieve high capacity.

Google Bigtable serves as the database for applications such as the Google App
Engine Datastore, Google Personalized Search, Google Earth and Google Analytics.
Google has maintained the software as a proprietary, in-house technology.
Nevertheless, Bigtable has had a large impact on NoSQL database design. Google
software developers publicly disclosed Bigtable details in a technical paper
presented at the USENIX Symposium on Operating Systems and Design
Implementation in 2006.

No SQL

NoSQL Database is a non-relational Data Management System, that does not require
a fixed schema. It avoids joins, and is easy to scale. The major purpose of using a
NoSQL database is for distributed data stores with humongous data storage needs.
NoSQL is used for Big data and real-time web apps. For example, companies like
Twitter, Facebook and Google collect terabytes of user data every single day.
NoSQL database stands for “Not Only SQL” or “Not SQL.” Though a better term
would be “NoREL”, NoSQL caught on. Carl Strozz introduced the NoSQL concept in
1998.

Traditional RDBMS uses SQL syntax to store and retrieve data for further insights.
Instead, a NoSQL database system encompasses a wide range of database
technologies that can store structured, semi-structured, unstructured and
polymorphic data. Let’s understand about NoSQL with a diagram in this NoSQL
database tutorial:

Features of NoSQL
Non-relational

 NoSQL databases never follow the relational model


 Never provide tables with flat fixed-column records
 Work with self-contained aggregates or BLOBs
 Doesn’t require object-relational mapping and data normalization
 No complex features like query languages, query planners,referential
integrity joins, ACID

Schema-free

 NoSQL databases are either schema-free or have relaxed schemas


 Do not require any sort of definition of the schema of the data
 Offers heterogeneous structures of data in the same domain

NoSQL is Schema-Free

Simple API

 Offers easy to use interfaces for storage and querying data provided
 APIs allow low-level data manipulation & selection methods
 Text-based protocols mostly used with HTTP REST with JSON
 Mostly used no standard based NoSQL query language
 Web-enabled databases running as internet-facing services

Distributed

 Multiple NoSQL databases can be executed in a distributed fashion


 Offers auto-scaling and fail-over capabilities
 Often ACID concept can be sacrificed for scalability and throughput
 Mostly no synchronous replication between distributed nodes Asynchronous
Multi-Master Replication, peer-to-peer, HDFS Replication
 Only providing eventual consistency
 Shared Nothing Architecture. This enables less coordination and higher
distribution.
NoSQL is Shared Nothing.

Types of NoSQL Databases


NoSQL Databases are mainly categorized into four types: Key-value pair,
Column-oriented, Graph-based and Document-oriented. Every category has its
unique attributes and limitations. None of the above-specified database is better to
solve all the problems. Users should select the database based on their product
needs.
Types of NoSQL Databases:

 Key-value Pair Based


 Column-oriented Graph
 Graphs based
 Document-oriented

Key Value Pair Based


Data is stored in key/value pairs. It is designed in such a way to handle lots of data
and heavy load.
Key-value pair storage databases store data as a hash table where each key is unique,
and the value can be a JSON, BLOB(Binary Large Objects), string, etc.

For example, a key-value pair may contain a key like “Website” associated with a
value like “Guru99”.

It is one of the most basic NoSQL database example. This kind of NoSQL database is
used as a collection, dictionaries, associative arrays, etc. Key value stores help the
developer to store schema-less data. They work best for shopping cart contents.

Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They
are all based on Amazon’s Dynamo paper.

Column-based
Column-oriented databases work on columns and are based on BigTable paper by
Google. Every column is treated separately. Values of single column databases are
stored contiguously.
Column based NoSQL database

They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN
etc. as the data is readily available in a column.

Column-based NoSQL databases are widely used to manage data


warehouses, business intelligence, CRM, Library card catalogs,

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based
database.

Document-Oriented:
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the
value part is stored as a document. The document is stored in JSON or XML formats.
The value is understood by the DB and can be queried.

Relational Vs. Document

In this diagram on your left you can see we have rows and columns, and in the right,
we have a document database which has a similar structure to JSON. Now for the
relational database, you have to know what columns you have and so on. However,
for a document database, you have data store like JSON object. You do not require
to define which make it flexible.

The document type is mostly used for CMS systems, blogging platforms, real-time
analytics & e-commerce applications. It should not use for complex transactions
which require multiple operations or queries against varying aggregate structures.

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.

Graph-Based
A graph type database stores entities as well the relations amongst those entities.
The entity is stored as a node with the relationship as edges. An edge gives a
relationship between nodes. Every node and edge has a unique identifier.

Compared to a relational database where tables are loosely connected, a Graph


database is a multi-relational in nature. Traversing relationship is fast as they are
already captured into the DB, and there is no need to calculate them.

Graph base database mostly used for social networks, logistics, spatial data.

Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases.

Query Mechanism tools for NoSQL


The most common data retrieval mechanism is the REST-based retrieval of a value
based on its key/ID with GET resource

Document store Database offers more difficult queries as they understand the value
in a key-value pair. For example, CouchDB allows defining views with MapReduce

What is the CAP Theorem?


CAP theorem is also called brewer’s theorem. It states that is impossible for a
distributed data store to offer more than two out of three guarantees

1. Consistency
2. Availability
3. Partition Tolerance
Consistency:

The data should remain consistent even after the execution of an operation. This
means once data is written, any future read request should contain that data. For
example, after updating the order status, all the clients should be able to see the
same data.

Availability:

The database should always be available and responsive. It should not have any
downtime.

Partition Tolerance:

Partition Tolerance means that the system should continue to function even if the
communication among the servers is not stable. For example, the servers can be
partitioned into multiple groups which may not communicate with each other. Here,
if part of the database is unavailable, other parts are always unaffected.

Eventual Consistency
The term “eventual consistency” means to have copies of data on multiple machines
to get high availability and scalability. Thus, changes made to any data item on one
machine has to be propagated to other replicas.

Data replication may not be instantaneous as some copies will be updated


immediately while others in due course of time. These copies may be mutually, but
in due course of time, they become consistent. Hence, the name eventual
consistency.

BASE: Basically Available, Soft state, Eventual consistency

 Basically, available means DB is available all the time as per CAP theorem
 Soft state means even without an input; the system state may change
 Eventual consistency means that the system will become consistent over
time
Advantages of NoSQL

 Can be used as Primary or Analytic Data Source


 Big Data Capability
 No Single Point of Failure
 Easy Replication
 No Need for Separate Caching Layer
 It provides fast performance and horizontal scalability.
 Can handle structured, semi-structured, and unstructured data with equal
effect
 Object-oriented programming which is easy to use and flexible
 NoSQL databases don’t need a dedicated high-performance server
 Support Key Developer Languages and Platforms
 Simple to implement than using RDBMS
 It can serve as the primary data source for online applications.
 Handles big data which manages data velocity, variety, volume, and
complexity
 Excels at distributed database and multi-data center operations
 Eliminates the need for a specific caching layer to store data
 Offers a flexible schema design which can easily be altered without downtime
or service disruption

Disadvantages of NoSQL

 No standardization rules
 Limited query capabilities
 RDBMS databases and tools are comparatively mature
 It does not offer any traditional database capabilities, like consistency when
multiple transactions are performed simultaneously.
 When the volume of data increases it is difficult to maintain unique values as
keys become difficult
 Doesn’t work as well with relational data
 The learning curve is stiff for new developers
 Open source options so not so popular for enterprises.

Features of NoSQL
Non-relational

 NoSQL databases never follow the relational model


 Never provide tables with flat fixed-column records
 Work with self-contained aggregates or BLOBs
 Doesn’t require object-relational mapping and data normalization
 No complex features like query languages, query planners,referential
integrity joins, ACID

Schema-free

 NoSQL databases are either schema-free or have relaxed schemas


 Do not require any sort of definition of the schema of the data
 Offers heterogeneous structures of data in the same domain

NoSQL is Schema-Free

Simple API

 Offers easy to use interfaces for storage and querying data provided
 APIs allow low-level data manipulation & selection methods
 Text-based protocols mostly used with HTTP REST with JSON
 Mostly used no standard based NoSQL query language
 Web-enabled databases running as internet-facing services

Distributed

 Multiple NoSQL databases can be executed in a distributed fashion


 Offers auto-scaling and fail-over capabilities
 Often ACID concept can be sacrificed for scalability and throughput
 Mostly no synchronous replication between distributed nodes Asynchronous
Multi-Master Replication, peer-to-peer, HDFS Replication
 Only providing eventual consistency
 Shared Nothing Architecture. This enables less coordination and higher
distribution.

NoSQL is Shared Nothing.

Types of NoSQL Databases


NoSQL Databases are mainly categorized into four types: Key-value pair,
Column-oriented, Graph-based and Document-oriented. Every category has its
unique attributes and limitations. None of the above-specified database is better to
solve all the problems. Users should select the database based on their product
needs.
Types of NoSQL Databases:

 Key-value Pair Based


 Column-oriented Graph
 Graphs based
 Document-oriented
Key Value Pair Based
Data is stored in key/value pairs. It is designed in such a way to handle lots of data
and heavy load.

Key-value pair storage databases store data as a hash table where each key is unique,
and the value can be a JSON, BLOB(Binary Large Objects), string, etc.

For example, a key-value pair may contain a key like “Website” associated with a
value like “Guru99”.

It is one of the most basic NoSQL database example. This kind of NoSQL database is
used as a collection, dictionaries, associative arrays, etc. Key value stores help the
developer to store schema-less data. They work best for shopping cart contents.

Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They
are all based on Amazon’s Dynamo paper.

Column-based
Column-oriented databases work on columns and are based on BigTable paper by
Google. Every column is treated separately. Values of single column databases are
stored contiguously.
Column based NoSQL database

They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN
etc. as the data is readily available in a column.

Column-based NoSQL databases are widely used to manage data


warehouses, business intelligence, CRM, Library card catalogs,

HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based
database.

Document-Oriented:
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the
value part is stored as a document. The document is stored in JSON or XML formats.
The value is understood by the DB and can be queried.

Relational Vs. Document

In this diagram on your left you can see we have rows and columns, and in the right,
we have a document database which has a similar structure to JSON. Now for the
relational database, you have to know what columns you have and so on. However,
for a document database, you have data store like JSON object. You do not require
to define which make it flexible.

The document type is mostly used for CMS systems, blogging platforms, real-time
analytics & e-commerce applications. It should not use for complex transactions
which require multiple operations or queries against varying aggregate structures.

Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular
Document originated DBMS systems.

Graph-Based
A graph type database stores entities as well the relations amongst those entities.
The entity is stored as a node with the relationship as edges. An edge gives a
relationship between nodes. Every node and edge has a unique identifier.

Compared to a relational database where tables are loosely connected, a Graph


database is a multi-relational in nature. Traversing relationship is fast as they are
already captured into the DB, and there is no need to calculate them.

Graph base database mostly used for social networks, logistics, spatial data.

Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases.

Query Mechanism tools for NoSQL


The most common data retrieval mechanism is the REST-based retrieval of a value
based on its key/ID with GET resource

Document store Database offers more difficult queries as they understand the value
in a key-value pair. For example, CouchDB allows defining views with MapReduce

What is the CAP Theorem?


CAP theorem is also called brewer’s theorem. It states that is impossible for a
distributed data store to offer more than two out of three guarantees

1. Consistency
2. Availability
3. Partition Tolerance

Consistency:

The data should remain consistent even after the execution of an operation. This
means once data is written, any future read request should contain that data. For
example, after updating the order status, all the clients should be able to see the
same data.

Availability:

The database should always be available and responsive. It should not have any
downtime.

Partition Tolerance:

Partition Tolerance means that the system should continue to function even if the
communication among the servers is not stable. For example, the servers can be
partitioned into multiple groups which may not communicate with each other. Here,
if part of the database is unavailable, other parts are always unaffected.

Eventual Consistency
The term “eventual consistency” means to have copies of data on multiple machines
to get high availability and scalability. Thus, changes made to any data item on one
machine has to be propagated to other replicas.

Data replication may not be instantaneous as some copies will be updated


immediately while others in due course of time. These copies may be mutually, but
in due course of time, they become consistent. Hence, the name eventual
consistency.

BASE: Basically Available, Soft state, Eventual consistency


 Basically, available means DB is available all the time as per CAP theorem
 Soft state means even without an input; the system state may change
 Eventual consistency means that the system will become consistent over
time

Advantages of NoSQL

 Can be used as Primary or Analytic Data Source


 Big Data Capability
 No Single Point of Failure
 Easy Replication
 No Need for Separate Caching Layer
 It provides fast performance and horizontal scalability.
 Can handle structured, semi-structured, and unstructured data with equal
effect
 Object-oriented programming which is easy to use and flexible
 NoSQL databases don’t need a dedicated high-performance server
 Support Key Developer Languages and Platforms
 Simple to implement than using RDBMS
 It can serve as the primary data source for online applications.
 Handles big data which manages data velocity, variety, volume, and
complexity
 Excels at distributed database and multi-data center operations
 Eliminates the need for a specific caching layer to store data
 Offers a flexible schema design which can easily be altered without downtime
or service disruption
Disadvantages of NoSQL

 No standardization rules
 Limited query capabilities
 RDBMS databases and tools are comparatively mature
 It does not offer any traditional database capabilities, like consistency when
multiple transactions are performed simultaneously.
 When the volume of data increases it is difficult to maintain unique values as
keys become difficult
 Doesn’t work as well with relational data
 The learning curve is stiff for new developers
 Open source options so not so popular for enterprises.

You might also like