Nosql
Nosql
1
The History of NoSQL Databases
Flat File Systems were popular in the early 1970s. Data was saved in flat files, which have
several drawbacks, including the fact that each organization uses its own flat files and there
are no standards. Because there is no standard way to store data, it is extremely difficult to
put data in files and retrieve data from files.
E.F. Codd devised the relational database, which answered the question of why there was no
standard way to store data.
However, relational databases later had the problem of being unable to handle large amounts
of data; as a result, a demand for a database that could handle all types of difficulties arose,
and the NoSQL database was created.
In the early 2000s, a paper published by Google on BigTable, the wide-column database,
explored the wide range of possibilities for a distributed storage system. 2009 saw a major
rise in NoSQL databases, with two key document-oriented databases, MongoDB and
CouchDB, coming into the picture.By the 2010s, different types of NoSQL databases
emerged and the acceptance of NoSQL became widespread, with businesses becoming more
data-driven.
Additionally, the Agile Manifesto was rising in popularity, and software engineers were
rethinking the way they developed software. They had to rapidly adapt to changing
requirements, iterate quickly, and make changes throughout their software stack — all the
way down to the database. NoSQL databases gave them this flexibility.
Cloud computing also rose in popularity, and developers began using public clouds to host
their applications and data. They wanted the ability to distribute data across multiple servers
and regions to make their applications resilient, to scale out instead of scale up, and to
intelligently geo-place their data. Some NoSQL databases, like MongoDB Atlas, provide
these capabilities.
Due to the exponential growth of digitization, businesses now collect as much unstructured
data as possible. To be able to analyze and derive actionable real-time insights from such big
data, businesses need modern solutions that go beyond simple storage.
2
The CAP Theorem
The CAP theorem, also known as the CAP principle, can be used to explain some of the
competing requirements in a Distributed System with replication. It was created to make
system designers aware of the trade-offs while designing networked shared-data systems. In
this article, we will discuss the CAP theorem, and the tradeoffs offered by it using various
real-life examples. After reading this article, we can ensure you that you can differentiate
between each property of the CAP theorem and prioritize them based on your use case.
The three letters in CAP refer to three desirable properties of Distributed Systems with
replicated data: Consistency (among replicated copies), Availability (of the system for read
and write operations) and Partition Tolerance (of the nodes in the network being partitioned
by a network fault).
Let’s take a detailed look at the three distributed system characteristics to which the CAP
theorem refers.
Consistency
Consistency means that all clients see the same data at the same time, no matter which node
they connect to. For this to happen, whenever data is written to one node, it must be instantly
forwarded or replicated to all the other nodes in the system before the write is deemed
‘successful.’
Availability
Availability means that any client making a request for data gets a response, even if one or
more nodes are down. Another way to state this—all working nodes in the distributed system
return a valid response for any request, without exception.
Partition tolerance
A partition is a communications break within a distributed system—a lost or temporarily
delayed connection between two nodes. Partition tolerance means that the cluster must
continue to work despite any number of communication breakdowns between nodes in the
system.
The CAP theorem states that distributed databases can have at most two of the three
properties: consistency, availability, and partition tolerance. As a result, database systems
prioritize only two properties at a time.
3
CA (Consistency and Availability)
These types of system always accept the request to view or modify the data sent by the user
and they are always responded with data which is consistent among all the database nodes of
a big, distributed network.
However, such type of distributed systems is not realizable in real world because when
network failure occurs, there are two options: Either send old data which was replicated
moments ago before network failure or do not allow user to access the already moments old
data. If we choose first option, our system will become Available and if we choose second
option our system will become Consistent.
The combination of consistency and availability is not possible in distributed systems and for
achieving CA, the system has to be monolithic such that when a user updates the state of the
system, all other users accessing it are also notified about the new changes which means that
the consistency is maintained. And since it follows monolithic architecture, all users are
connected to single system which means it is also available. These types of systems are
generally not preferred due to a requirement of distributed computing which can be only done
when consistency or availability is sacrificed for partition tolerance.
Example databases: MySQL, PostgreSQL
4
The system prioritizes consistency over availability and does not allow users to read crucial
data from the stored replica which was backed up prior to the occurrence of network
partition. Consistency is chosen over availability for critical applications where latest data
plays an important role such as stock market application, ticket booking application, banking,
etc. where problem will arise due to old data present to users of application.
For example, in a train ticket booking application, there is one seat which can be booked. A
replica of the database is created, and it is sent to other nodes of the distributed system. A
network outage occurs which causes the user connected to the partitioned node to fetch
details from this replica. Some user connected to the unpartitioned part of distributed network
and already booked the last remaining seat. However, the user connected to partitioned node
will still one seat which makes the available data inconsistent. It would have been better if the
user was shown error and make the system unavailable for the user and maintain consistency.
Hence consistency is chosen in such scenarios.
5
EXPERIMENT-2
Aim: NoSQL vs RDBMS, types and classification of NOSQL Databases
Relational databases (RDBMS) and NoSQL databases differ primarily in how they store and manage data.
RDBMS uses a structured format with predefined schemas and tables, making it ideal for transactional systems
where data consistency and integrity are paramount. In contrast, NoSQL databases offer flexibility with
dynamic schemas, supporting diverse data models such as document, key-value, columnar, and graph stores.
NoSQL databases are built for scalability and performance in handling large-scale, distributed data, often
sacrificing strict consistency for faster, more flexible operations. Each is suited to different use cases, depending
on the application needs.
Key-value stores
Column-oriented databases
Graph-based databases
The document-based database is a nonrelational database. Instead of storing the data in rows and columns
(tables), it uses the documents to store the data in the database. A document database stores data in JSON,
BSON, or XML documents.
Documents can be stored and retrieved in a form that is much closer to the data objects used in applications
which means less translation is required to use these data in the applications. In the Document database, the
particular elements can be accessed by using the index value that is assigned for faster querying.
Key-Value Stores:
A key-value store is a nonrelational database. The simplest form of a NoSQL database is a key-
value store. Every data element in the database is stored in key-value pairs. The data can be
retrieved by using a unique key allotted to each element in the database. The values can be simple
data types like strings and numbers or complex objects.
A column-oriented database is a non-relational database that stores the data in columns instead of
rows. That means when we want to run analytics on a small number of columns, you can read
those columns directly without consuming memory with the unwanted data.
Columnar databases are designed to read data more efficiently and retrieve the data with greater
speed. A columnar database is used to store a large amount of data
Graph-Based databases:
Graph-based databases focus on the relationship between the elements. It stores the data in the form
of nodes in the database. The connections between the nodes are called links or relationships.
EXPERIMENT-3
AIM: A Case Study on Key Value Database
Objectives: -
● Understand the basic concepts of Key Value Database in NoSQL.
● Learn the key features of Key Value Database.
● Understand the concept of Key Value Database store in non relational database.
Introduction: Key value databases, also known as key value stores, are database types where data is stored in a
“key-value” format and optimized for reading and writing that data. The data is fetched by a unique key or a
number of unique keys to retrieve the associated value with each key. The values can be simple data types like
strings and numbers or complex objects.
MongoDB covers a wide range of database examples and use cases, supporting key-value pair data concepts.
With its flexible schema and rich query language with secondary indexes, MongoDB is a compelling store for
“key-value” data. Learn more in this article and try it with MongoDB Atlas, MongoDB’s Database-as-a-Service
platform.
Over the years, database systems have evolved from legacy relational databases storing data in rows and
columns to NoSQL distributed databases allowing a solution per use case. Key-value pair stores are not a new
concept and were already with us for the last few decades. One of the known stores is the old Windows Registry
allowing the system/applications to store data in a “key-value” structure, where a key can be represented as a
unique identifier or a unique path to the value.
Data is written (inserted, updated, and deleted) and queried based on the key to store/retrieve its value.
A key-value database, AKA key-value store, associates a value (which can be anything from a number or
simple string to a complex object) with a key, which is used to keep track of the object. In its simplest form, a
key-value store is like a dictionary/array/map object as it exists in most programming paradigms, but which is
stored in a persistent way and managed by a Database Management System (DBMS).
Key-value databases use compact, efficient index structures to be able to quickly and reliably locate a value by
its key, making them ideal for systems that need to be able to find and retrieve data in constant time. Redis, for
instance, is a key-value database that is optimized for tracking relatively simple data structures (primitive types,
lists, heaps, and maps) in a persistent database. By only supporting a limited number of value types, Redis is
able to expose an extremely simple interface to querying and manipulating them, and when configured
optimally is capable of high throughput.
A key-value database is defined by the fact that it allows programs or users of programs to retrieve data by
keys, which are essentially names, or identifiers, that point to some stored value. Because key-value databases
are defined so simply, but can be extended and optimized in numerous ways, there is no global list of features,
but there are a few common ones:
Retrieving a value (if there is one) stored and associated with a given key
Deleting the value (if there is one) stored and associated with a given key
Setting, updating, and replacing the value (if there is one) associated with a given key
Modern applications will probably require more than the above, but this is the bare minimum for a key-value
store.
There are several use-cases where choosing a key value store approach is an optimal solution:
Real time random data access, e.g., user session attributes in an online application such as gaming or
finance.
Caching mechanism for frequently accessed data or configuration based on keys.
Application is designed on simple key-based queries.
MongoDB stores data in collections, which are a group of BSON (Binary JSON) documents where each
document is essentially built from a field-value structure. The ability of MongoDB to efficiently store flexible
schema documents and perform an index on any of the additional fields for random seeks makes it a compelling
key-value store.
{
session_id : "ueyrt-jshdt-6783-utyrts",
create_time : 1122384666000
}
Further, MongoDB’s document values allow nested key-value structures, allowing not only for accessing data
by key in a global sense, but accessing and manipulating data associated with keys within documents, and even
creating indexes that allow fast retrieval by these secondary kinds of keys.
{
name: "John",
age : 35,
dob : ISODate("01-05-1990"),
profile_pic :
"https://example.com/john.jpg", social : {
twitter : "@mongojohn",
linkedin : "https://linkedin.com/abcd_mongojohn"
}
}
MongoDB’s native drivers support multiple top used languages like Python, C#, C++, and Node.js, allowing
you to store the key value data in your language of choice.
Each one of the fields can be Indexed based on your query patterns. For example, if we seek for a specific
sessionid as the key and the createtime as a value, we can index db.sessions.createIndex({session_id : 1}) and
query on that key:
Wild card indexing allows users to index every field or a subset of fields in a MongoDB collection. Therefore,
if we have a set of field-value types stored in a single document and queries could come dynamically for each
identifier, we can create a single index for those field value sets.
db.profiles.createIndex({"$**" : 1 });
As a result, our queries will have a full per field-value query supported by this index. Having said that, wild
card indexing should only be used in use cases when we cannot predict the field names upfront and the variety
of the queries predicates require so. See wild card restrictions for more information.
Since MongoDB documents can be complex objects, applications can use a schema design to minimize index
footprints and optimize access for a “key-value” approach. This design pattern is called the Attribute
Pattern and it utilizes arrays of documents to store a “key-value” structure.
attributes: [
{
key: "USA",
value: ISODate("1977-05-20T01:00:00+01:00")
},
{
key: "France",
value: ISODate("1977-10-19T01:00:00+01:00")
},
{
key: "Italy",
value: ISODate("1977-10-20T01:00:00+01:00")
},
{
key: "UK",
value: ISODate("1977-12-27T01:00:00+01:00")
},
...
]
Indexing {attributes.key : 1 , attributes.value : 1} will allow us to search on any key with just one index.
Databases supporting key-value stores persist the data to a disk serving the database files, while a key-value
cache implementation will mostly keep the data loaded in memory. In case of a server fault or restart, the data
needs to be preloaded into the cache as it was not persistent.
MongoDB uses the cache of its WiredTiger engine to optimize data access and read performance together with
strong consistency and high availability across replica sets. This allows for more resilient and available field-
value stores while still using the best performance of cached data.
A key-value approach allows defining efficient and compact data structure to access data in a simple form of a
key-value fetch/update/remove.
MongoDB documents can form compact flexible structures to support fast indexing for your key-value stores.
On the other hand, MongoDB documents may consist of rich objects which can contain entire hierarchies and
sub-values, and sophisticated indexing allows documents to be retrieved by any number of different keys.
Conclusion
Key-value stores are used for use cases where applications will require values to be retrieved fast via keys, like
maps or dictionaries in programming languages. The compact structure and straightforward indexing and
seeking through those indexes makes this database concept a win for specific application workloads.
However, modern applications will probably require more than just a key-value retrieval and this is where
MongoDB and MongoDB Atlas offer the optimal solution. MongoDB can support the field-value store solution
while allowing complex objects to be formed and multiple ways to query the data: Full-Text
Search, Aggregation Framework, Atlas Data Tiering, or Scaling it across multiple shards.Try MongoDB
Atlas as your key-value database and reveal new possibilities to innovate your applications.