0% found this document useful (0 votes)
22 views

Nosql

Uploaded by

othor0170
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Nosql

Uploaded by

othor0170
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

EXPERIMENT-01

AIM: Introduction to NOSQL


Procedure:
1. Research & write a brief report on the history of NOSQL
2. List and describe the key features -Scalability, cost & flexibility
3. Illustrate the CAP theorem with example
Introduction:
A NoSQL database, which stands for "non-SQL" or "nonrelational," is a database that allows
for data storage and retrieval. This information is represented in ways other than tabular
relationships found in relational databases. Such databases first appeared in the late 1960s,
but it wasn't until the early twenty-first century that they were given the name NoSQL.
NoSQL databases are increasingly being used in real-time web applications and big data
analytics. Not only is SQL a term used to stress the fact that NoSQL systems may support
SQL-like query languages.
A NoSQL database offers design simplicity, horizontal scaling to clusters of servers, and
greater control over availability. NoSQL databases employ different Data Structure than
relational databases by default, allowing NoSQL to execute some tasks quicker.
The applicability of a NoSQL database is determined by the problem it is supposed to answer.
NoSQL databases' data structures are sometimes seen to be more flexible than relational
database tables.
Most NoSQL databases have eventual consistency, which implies that changes to the
database are propagated to all nodes over time. As a result, data searches may not
immediately return updated data or may result in erroneous data being read, a condition
known as stale reads. Lost writes and other data loss may occur in several NoSQL systems.
Certain NoSQL systems include capabilities like write-ahead logging to prevent data loss.
When doing distributed transaction processing across several databases, maintaining data
consistency becomes much more complex.
NoSQL databases are generally classified into four main categories:
1. Document databases: These databases store data as semi-structured documents, such
as JSON or XML, and can be queried using document-oriented query languages.
2. Key-value stores: These databases store data as key-value pairs, and are optimized
for simple and fast read/write operations.
3. Column-family stores: These databases store data as column families, which are sets
of columns that are treated as a single entity. They are optimized for fast and efficient
querying of large amounts of data.
4. Graph databases: These databases store data as nodes and edges, and are designed to
handle complex relationships between data.

1
The History of NoSQL Databases

Flat File Systems were popular in the early 1970s. Data was saved in flat files, which have
several drawbacks, including the fact that each organization uses its own flat files and there
are no standards. Because there is no standard way to store data, it is extremely difficult to
put data in files and retrieve data from files.
E.F. Codd devised the relational database, which answered the question of why there was no
standard way to store data.
However, relational databases later had the problem of being unable to handle large amounts
of data; as a result, a demand for a database that could handle all types of difficulties arose,
and the NoSQL database was created.
In the early 2000s, a paper published by Google on BigTable, the wide-column database,
explored the wide range of possibilities for a distributed storage system. 2009 saw a major
rise in NoSQL databases, with two key document-oriented databases, MongoDB and
CouchDB, coming into the picture.By the 2010s, different types of NoSQL databases
emerged and the acceptance of NoSQL became widespread, with businesses becoming more
data-driven.
Additionally, the Agile Manifesto was rising in popularity, and software engineers were
rethinking the way they developed software. They had to rapidly adapt to changing
requirements, iterate quickly, and make changes throughout their software stack — all the
way down to the database. NoSQL databases gave them this flexibility.
Cloud computing also rose in popularity, and developers began using public clouds to host
their applications and data. They wanted the ability to distribute data across multiple servers
and regions to make their applications resilient, to scale out instead of scale up, and to
intelligently geo-place their data. Some NoSQL databases, like MongoDB Atlas, provide
these capabilities.
Due to the exponential growth of digitization, businesses now collect as much unstructured
data as possible. To be able to analyze and derive actionable real-time insights from such big
data, businesses need modern solutions that go beyond simple storage.

Key Features of NoSQL:


Horizontal scalability: NoSQL databases are designed to scale out by adding more nodes to
a database cluster, making them well-suited for handling large amounts of data and high
levels of traffic.
Flexibility: NoSQL databases allow developers to store and retrieve data in a flexible and
dynamic manner, with support for multiple data types and changing data structures.
Cost-effectiveness: NoSQL databases are often more cost-effective than traditional relational
databases, as they are typically less complex and do not require expensive hardware or
software.

2
The CAP Theorem
The CAP theorem, also known as the CAP principle, can be used to explain some of the
competing requirements in a Distributed System with replication. It was created to make
system designers aware of the trade-offs while designing networked shared-data systems. In
this article, we will discuss the CAP theorem, and the tradeoffs offered by it using various
real-life examples. After reading this article, we can ensure you that you can differentiate
between each property of the CAP theorem and prioritize them based on your use case.
The three letters in CAP refer to three desirable properties of Distributed Systems with
replicated data: Consistency (among replicated copies), Availability (of the system for read
and write operations) and Partition Tolerance (of the nodes in the network being partitioned
by a network fault).
Let’s take a detailed look at the three distributed system characteristics to which the CAP
theorem refers.
Consistency
Consistency means that all clients see the same data at the same time, no matter which node
they connect to. For this to happen, whenever data is written to one node, it must be instantly
forwarded or replicated to all the other nodes in the system before the write is deemed
‘successful.’
Availability
Availability means that any client making a request for data gets a response, even if one or
more nodes are down. Another way to state this—all working nodes in the distributed system
return a valid response for any request, without exception.
Partition tolerance
A partition is a communications break within a distributed system—a lost or temporarily
delayed connection between two nodes. Partition tolerance means that the cluster must
continue to work despite any number of communication breakdowns between nodes in the
system.
The CAP theorem states that distributed databases can have at most two of the three
properties: consistency, availability, and partition tolerance. As a result, database systems
prioritize only two properties at a time.

3
CA (Consistency and Availability)
These types of system always accept the request to view or modify the data sent by the user
and they are always responded with data which is consistent among all the database nodes of
a big, distributed network.
However, such type of distributed systems is not realizable in real world because when
network failure occurs, there are two options: Either send old data which was replicated
moments ago before network failure or do not allow user to access the already moments old
data. If we choose first option, our system will become Available and if we choose second
option our system will become Consistent.
The combination of consistency and availability is not possible in distributed systems and for
achieving CA, the system has to be monolithic such that when a user updates the state of the
system, all other users accessing it are also notified about the new changes which means that
the consistency is maintained. And since it follows monolithic architecture, all users are
connected to single system which means it is also available. These types of systems are
generally not preferred due to a requirement of distributed computing which can be only done
when consistency or availability is sacrificed for partition tolerance.
Example databases: MySQL, PostgreSQL

AP (Availability and Partition Tolerance)


These types of system are distributed in nature, ensuring that the request sent by the user to
view or modify the data present in the database nodes are not dropped and are processed in
presence of a network partition.
The system prioritizes availability over consistency and can respond with possibly stale data
which was replicated from other nodes before the partition was created due to some technical
failure. Such design choices are generally used while building social media websites such as
Facebook, Instagram, Reddit, etc. and online content websites like YouTube, blog, news, etc.
where consistency is usually not required, and a bigger problem arises if the service is
unavailable causing corporations to lose money since the users may shift to new platform.
The system can be distributed across multiple nodes and is designed to operate reliably even
in the face of network partitions.
Example databases: Amazon DynamoDB, Google Cloud Spanner.

CP (Consistency and Partition Tolerance)


These types of system are distributed in nature, ensuring that the request sent by the user to
view or modify the data present in the database nodes are dropped instead of responding with
inconsistent data in presence of a network partition.

4
The system prioritizes consistency over availability and does not allow users to read crucial
data from the stored replica which was backed up prior to the occurrence of network
partition. Consistency is chosen over availability for critical applications where latest data
plays an important role such as stock market application, ticket booking application, banking,
etc. where problem will arise due to old data present to users of application.
For example, in a train ticket booking application, there is one seat which can be booked. A
replica of the database is created, and it is sent to other nodes of the distributed system. A
network outage occurs which causes the user connected to the partitioned node to fetch
details from this replica. Some user connected to the unpartitioned part of distributed network
and already booked the last remaining seat. However, the user connected to partitioned node
will still one seat which makes the available data inconsistent. It would have been better if the
user was shown error and make the system unavailable for the user and maintain consistency.
Hence consistency is chosen in such scenarios.

Example databases: Apache HBase, MongoDB, Redis.

5
EXPERIMENT-2
Aim: NoSQL vs RDBMS, types and classification of NOSQL Databases

Relational databases (RDBMS) and NoSQL databases differ primarily in how they store and manage data.
RDBMS uses a structured format with predefined schemas and tables, making it ideal for transactional systems
where data consistency and integrity are paramount. In contrast, NoSQL databases offer flexibility with
dynamic schemas, supporting diverse data models such as document, key-value, columnar, and graph stores.
NoSQL databases are built for scalability and performance in handling large-scale, distributed data, often
sacrificing strict consistency for faster, more flexible operations. Each is suited to different use cases, depending
on the application needs.

Difference between Relational database and NoSQL:

Parameter Relational Database NoSQL


(RDBMS)
Data Model Tables with rows and Key-Value, Document,
columns Column, Graph models
Schema Fixed schema Dynamic schema
(predefined structure) (flexible structure)
Scalability Vertical scalability Horizontal scalability
Transactions Follows ACID properties Follows BASE
properties (Eventual
consistency)
Query Language SQL (Structured Query Varies (e.g., JSON-like
Language) queries, custom APIs)
Consistency Strong consistency Eventual consistency
(depending on the
model)
Performance Slower for high-volume Faster for large-scale
read/write loads data handling
Joins Supports complex joins Does not support joins
(de-normalized data)
Flexibility Less flexible due to rigid Highly flexible, schema-
schema less or schema-on-read
Scaling Costs Higher cost due to Lower cost with
vertical scaling horizontal scaling
Use Cases Suitable for transactional Suitable for big data,
systems real-time applications
Examples MySQL, PostgreSQL, MongoDB, Cassandra,
Oracle Redis, Neo4j
Types of NoSQL Database:
 Document-based databases

 Key-value stores

 Column-oriented databases

 Graph-based databases

Classification of NOSQL databases


Document-Based Database:

The document-based database is a nonrelational database. Instead of storing the data in rows and columns
(tables), it uses the documents to store the data in the database. A document database stores data in JSON,
BSON, or XML documents.
Documents can be stored and retrieved in a form that is much closer to the data objects used in applications
which means less translation is required to use these data in the applications. In the Document database, the
particular elements can be accessed by using the index value that is assigned for faster querying.

Key-Value Stores:

A key-value store is a nonrelational database. The simplest form of a NoSQL database is a key-
value store. Every data element in the database is stored in key-value pairs. The data can be
retrieved by using a unique key allotted to each element in the database. The values can be simple
data types like strings and numbers or complex objects.

Column Oriented Databases:

A column-oriented database is a non-relational database that stores the data in columns instead of
rows. That means when we want to run analytics on a small number of columns, you can read
those columns directly without consuming memory with the unwanted data.
Columnar databases are designed to read data more efficiently and retrieve the data with greater
speed. A columnar database is used to store a large amount of data

Graph-Based databases:

Graph-based databases focus on the relationship between the elements. It stores the data in the form
of nodes in the database. The connections between the nodes are called links or relationships.
EXPERIMENT-3
AIM: A Case Study on Key Value Database

Objectives: -
● Understand the basic concepts of Key Value Database in NoSQL.
● Learn the key features of Key Value Database.
● Understand the concept of Key Value Database store in non relational database.

Introduction: Key value databases, also known as key value stores, are database types where data is stored in a
“key-value” format and optimized for reading and writing that data. The data is fetched by a unique key or a
number of unique keys to retrieve the associated value with each key. The values can be simple data types like
strings and numbers or complex objects.

MongoDB covers a wide range of database examples and use cases, supporting key-value pair data concepts.
With its flexible schema and rich query language with secondary indexes, MongoDB is a compelling store for
“key-value” data. Learn more in this article and try it with MongoDB Atlas, MongoDB’s Database-as-a-Service
platform.

What is a key-value database?

Over the years, database systems have evolved from legacy relational databases storing data in rows and
columns to NoSQL distributed databases allowing a solution per use case. Key-value pair stores are not a new
concept and were already with us for the last few decades. One of the known stores is the old Windows Registry
allowing the system/applications to store data in a “key-value” structure, where a key can be represented as a
unique identifier or a unique path to the value.

Data is written (inserted, updated, and deleted) and queried based on the key to store/retrieve its value.

Fig.1 key-value databases


How do key-value databases work?

A key-value database, AKA key-value store, associates a value (which can be anything from a number or
simple string to a complex object) with a key, which is used to keep track of the object. In its simplest form, a
key-value store is like a dictionary/array/map object as it exists in most programming paradigms, but which is
stored in a persistent way and managed by a Database Management System (DBMS).

Key-value databases use compact, efficient index structures to be able to quickly and reliably locate a value by
its key, making them ideal for systems that need to be able to find and retrieve data in constant time. Redis, for
instance, is a key-value database that is optimized for tracking relatively simple data structures (primitive types,
lists, heaps, and maps) in a persistent database. By only supporting a limited number of value types, Redis is
able to expose an extremely simple interface to querying and manipulating them, and when configured
optimally is capable of high throughput.

What are the features of a key-value database?

A key-value database is defined by the fact that it allows programs or users of programs to retrieve data by
keys, which are essentially names, or identifiers, that point to some stored value. Because key-value databases
are defined so simply, but can be extended and optimized in numerous ways, there is no global list of features,
but there are a few common ones:

 Retrieving a value (if there is one) stored and associated with a given key
 Deleting the value (if there is one) stored and associated with a given key
 Setting, updating, and replacing the value (if there is one) associated with a given key

Modern applications will probably require more than the above, but this is the bare minimum for a key-value
store.

When to use a key-value database

There are several use-cases where choosing a key value store approach is an optimal solution:

 Real time random data access, e.g., user session attributes in an online application such as gaming or
finance.
 Caching mechanism for frequently accessed data or configuration based on keys.
 Application is designed on simple key-based queries.

MongoDB as a key-value store

MongoDB stores data in collections, which are a group of BSON (Binary JSON) documents where each
document is essentially built from a field-value structure. The ability of MongoDB to efficiently store flexible
schema documents and perform an index on any of the additional fields for random seeks makes it a compelling
key-value store.

{
session_id : "ueyrt-jshdt-6783-utyrts",
create_time : 1122384666000
}

Further, MongoDB’s document values allow nested key-value structures, allowing not only for accessing data
by key in a global sense, but accessing and manipulating data associated with keys within documents, and even
creating indexes that allow fast retrieval by these secondary kinds of keys.

{
name: "John",
age : 35,
dob : ISODate("01-05-1990"),
profile_pic :
"https://example.com/john.jpg", social : {
twitter : "@mongojohn",
linkedin : "https://linkedin.com/abcd_mongojohn"
}
}

MongoDB’s native drivers support multiple top used languages like Python, C#, C++, and Node.js, allowing
you to store the key value data in your language of choice.

Secondary indexes to support key value

Each one of the fields can be Indexed based on your query patterns. For example, if we seek for a specific
sessionid as the key and the createtime as a value, we can index db.sessions.createIndex({session_id : 1}) and
query on that key:

db.sessions.find({session_id : "ueyrt-jshdt-6783-utyrts" },{create_time : 1}).create_time;

Wild card indexes to support key value

Wild card indexing allows users to index every field or a subset of fields in a MongoDB collection. Therefore,
if we have a set of field-value types stored in a single document and queries could come dynamically for each
identifier, we can create a single index for those field value sets.
db.profiles.createIndex({"$**" : 1 });

As a result, our queries will have a full per field-value query supported by this index. Having said that, wild
card indexing should only be used in use cases when we cannot predict the field names upfront and the variety
of the queries predicates require so. See wild card restrictions for more information.

Schema design to support key value

Since MongoDB documents can be complex objects, applications can use a schema design to minimize index
footprints and optimize access for a “key-value” approach. This design pattern is called the Attribute
Pattern and it utilizes arrays of documents to store a “key-value” structure.

attributes: [
{
key: "USA",
value: ISODate("1977-05-20T01:00:00+01:00")
},
{
key: "France",
value: ISODate("1977-10-19T01:00:00+01:00")
},
{
key: "Italy",
value: ISODate("1977-10-20T01:00:00+01:00")
},
{
key: "UK",
value: ISODate("1977-12-27T01:00:00+01:00")
},
...
]

Indexing {attributes.key : 1 , attributes.value : 1} will allow us to search on any key with just one index.

Key-value database vs cache

Databases supporting key-value stores persist the data to a disk serving the database files, while a key-value
cache implementation will mostly keep the data loaded in memory. In case of a server fault or restart, the data
needs to be preloaded into the cache as it was not persistent.

MongoDB uses the cache of its WiredTiger engine to optimize data access and read performance together with
strong consistency and high availability across replica sets. This allows for more resilient and available field-
value stores while still using the best performance of cached data.

Advantages of key-value databases

A key-value approach allows defining efficient and compact data structure to access data in a simple form of a
key-value fetch/update/remove.

MongoDB documents can form compact flexible structures to support fast indexing for your key-value stores.
On the other hand, MongoDB documents may consist of rich objects which can contain entire hierarchies and
sub-values, and sophisticated indexing allows documents to be retrieved by any number of different keys.

Conclusion

Key-value stores are used for use cases where applications will require values to be retrieved fast via keys, like
maps or dictionaries in programming languages. The compact structure and straightforward indexing and
seeking through those indexes makes this database concept a win for specific application workloads.

However, modern applications will probably require more than just a key-value retrieval and this is where
MongoDB and MongoDB Atlas offer the optimal solution. MongoDB can support the field-value store solution
while allowing complex objects to be formed and multiple ways to query the data: Full-Text
Search, Aggregation Framework, Atlas Data Tiering, or Scaling it across multiple shards.Try MongoDB
Atlas as your key-value database and reveal new possibilities to innovate your applications.

You might also like