No SQL
No SQL
MongoDB features 7
Rich query support 7
Indexing 7
Replication 7
Load balancing 7
File storage 7
Aggregation 7
MongoDB Collections 8
MongoDB Documents 8
Uses of MongoDB: 8
Applications of MongoDB 9
User profiles: 9
1
Product and catalog data: 9
Metadata: 9
Content: 9
Limitations of MongoDB 9
Indexing 12
Replication in MongoDB 12
Automatic failover in replication 13
Sharded clusters 13
Cassandra Features 14
No single point of failure 14
Tunable consistency 14
Data center awareness 14
Linear scalability 14
Key Features 14
Redis Antimatterns 16
Security 16
Execute multiple commands together in transaction blocks (Thalla, et-al 2015), instead of
running them individually 17
2
ACID (atomicity, consistency, isolation, and durability)
BASE (basically available, soft state, and eventually consistent) implies the database will, at
some point, classify and index the content to improve the findability/discoverability of data or
information contained in the text or the object.
NoSQL features
1. Flexible Data Model
Relational and NoSQL data models are very different. The relational model takes data
and separates it into many interrelated tables that contain rows and columns. Tables
reference each other through foreign keys, which are stored in columns as well. When
looking up data, the desired information needs to be collected from many tables and
combined before it can be provided to the application or the users. Similarly, when
writing data, the write needs to be coordinated and performed on many tables.
3
2. High Performance and Scalability
To deal with the increase in the number of concurrent users (big users) and the amount
of data (big data), applications and their underlying databases need to scale using one of
two choices: scale up or scale out. Scaling up implies a centralized approach that relies
on bigger and bigger servers with more quantity of resources such as memory,
processing power, storage, and I/O capacity. Scaling out implies a distributed approach
that leverages many commodity physical or virtual servers to tackle more user as well as
data loads. Prior to the now famous NoSQL databases, the default scaling approach at
the database tier was to scale up. This was dictated by the fundamentally centralized
and shared-everything architecture of relational database technology.
NoSQL databases were developed from the ground up to be distributed and scale-out
databases. They use a cluster of physical or virtual servers to store big data and support
all the standard database operations. To scale out, additional servers are joined to the
cluster at runtime. The data and the various database operations are holistically spread
across the larger cluster. Since commodity servers are expected to fail frequently, NoSQL
databases are built to inherently tolerate and recover from such failures making them
highly resilient. NoSQL databases provide a much easier and linear approach to database
scaling. If 10,000 new users start using your application, simply add another database
server to your cluster. There is no need to modify the application as you scale since the
application always sees a single (distributed) database. NoSQL databases share some
characteristics with respect to scaling and performance.
3. Auto-Sharding
A NoSQL database automatically spreads data across servers without requiring
applications to participate. Servers can be added or removed from the data layer
without the much-worried application downtime. Most NoSQL databases support data
replication, storing multiple copies of the same data across the cluster and even across
data centers to ensure high availability (HA) and to support disaster recovery (DR). A
properly managed NoSQL database system should never need to be taken offline, for
any reason. Thus NoSQL databases are scalable and available.
5. Integrated Caching
To reduce latency and increase sustained data throughput, advanced NoSQL database
technologies transparently cache data in system memory. This behavior is transparent
to the application developer and the operations team, compared to relational
4
technology where a caching tier is usually a separate infrastructure tier that must be
developed to and deployed on separate servers and explicitly managed by the
operations team.
Structured data and moderate Multi Structured data and high data
data volumes volumes
1. Architecture
5
Some NoSQL databases like MongoDB are architected in a master/slave model, whereas
NoSQL databases such as Cassandra are designed in a “masterless” fashion. That is,
every node in a database cluster has the same capability and functionality. The
architecture of a NoSQL database greatly impacts how well the database system
supports various functional as well as nonfunctional requirements such as uptime,
multi-geography data replication, predictable performance, and more.
2. Data model
If the data you need to store can be represented in a simple table structure, RDBMS is
sufficient. But when you have complex data with multiple levels of nesting, it cannot be
modeled into relational tables. For example, multilevel nesting can easily be represented
in a JSON format and many NoSQL databases. NoSQL databases are often classified by
the data model they support. Some support a wide-row tabular store, while others sport
a model that is either document-oriented, key-value, or graph. All aspects of the data
model are not known at design time. Therefore, if the schema has to be modified, then
NoSQL databases are the only option.
4. Development model
NoSQL databases differ on their development APIs with some supporting SQL-like
languages (e.g., Cassandra's CQL).
MongoDB
6
MongoDB features
● Rich query support
We can query the database as we do with SQL databases. It has a large query set that
supports insert, update, delete and select operations. MongoDB supports fields, range
queries, and regular expressions. Queries also support the projection where they return
a value for specific keys.
● Indexing
MongoDB supports primary and secondary indices in its fields.
● Replication
Replication means providing more than one copy of data. MongoDB provides multiple
copies of data with multiple servers. It provides fault tolerance, if one database server
goes down, the application uses other database servers.
● Load balancing
Replica sets provide multiple copies of data. MongoDB can scale read operation by client
request directly to the secondary node. This divides loads across multiple servers.
● File storage
We can store documents up to 6 MB directly to the MongoDB JSON field. For documents
exceeding the size limit of 16 MB, MongoDB provides GridFS to store in chunks.
● Aggregation
The aggregate function takes a number of records and calculates single results like sum,
min, and max. MongoDB provides a data pipeline and multistage pipeline to move large
data to the aggregate function which improves performance.
7
MongoDB Collections
A grouping of MongoDB documents. A collection is the equivalent of an RDBMS table which
stores data in rows. The collection should only store related documents. For example, the
user_profiles collection should only store data related to user profiles. It should not contain a
user's friend list as this should not be a part of a user's profile; instead, this should fall under
the users_friend collection.
MongoDB Documents
Data in MongoDB is actually stored in the form of documents. The document is a collection of
key-value pairs. The key is also known as an attribute. Documents have a dynamic schema and
documents in the same collection may vary in field set.
MongoDB documents have a special field called _id._id is a 12-byte hexadecimal number that
ensures the uniqueness of the document. It is generated by MongoDB if not provided by the
developer.
Uses of MongoDB:
● Document-oriented storage
● Index on any attribute
● Replication and high availability
● Auto-sharding
● Rich query support
● Professional support from MongoDB
8
Applications of MongoDB
● User profiles:
Authentication tools and LDAP are good for authentication and authorization, but data,
such as rewards, criminal records, promotions, phone numbers, and addresses are
added day by day. Other databases are not able to adopt such quick-changing data. We
can use MongoDB dynamic documents to store such data over time in the document.
● Metadata:
We often require metadata that describes our data. In such scenarios, a graph-based
database is a good choice, but we can also use MongoDB for these applications.
● Content:
MongoDB is mainly a document database. It is great for serving text as well as HTML
documents. Also, it provides fine control over storing and indexing contents.
Limitations of MongoDB
● The maximum document size supported by MongoDB is 16 MB.
● The maximum document-nesting level supported by MongoDB is 100.
● The maximum namespace (database + collection name) supported by MongoDB is 123
characters.
● The database name is limited to 64 characters.
● If we apply an index on any field, that field value cannot contain more than 1024 bytes.
● A maximum of 64 indexes are allowed per collection and a maximum of 34 fields are
allowed in compound indexes.
● A hashed index cannot be unique.
● A maximum of 12 nodes are allowed in a replica set.
Insert One
9
Insert Many
1. db.collection.update();
2. db.collection.updateOne();
3. db.collection.updateMany();
4. db.collection.findAndModify();
5. db.collection.findOneAndUpdate();
6. db.collection.findOneAndReplace();
7. db.collection.save();
8. db.collection.bulkWrite();
10
Operator Description
$gte Matches values that are greater than or equal to a specified value
$lte Matches values that are less than or equal to a specified value
db.collection.findOneAndReplace();
db.collection.findOneAndUpdate();
db.collection.findAndModify();
db.collection.save();
db.collection.bulkWrite();
db.collection.findOneAndDelete()
db.collection.findAndModify()
11
db.collection.bulkWrite()
When there is a one-to-many relationship with the embedded document. Here we can store the
embedded document as an array of the object field.
Indexing
Replication in MongoDB
A replica set is a group of MongoDB instances that have the same dataset. A replica set has one
arbiter node and multiple data-bearing nodes. In data-bearing nodes, one node is considered
the primary node while the other nodes are considered the secondary nodes.
12
All write operations happen at the primary node. Once a write occurs at the primary node, the
data is replicated across the secondary nodes internally to make copies of the data available to
all nodes and to avoid data inconsistency.
If a primary node is not available for the operation, secondary nodes use election algorithms to
select one of their nodes as a primary node.
Secondary nodes apply read/write operations from a primary node to secondary nodes
asynchronously.
Primary nodes always communicate with other members every 10 seconds. If it fails to
communicate with the others in 10 seconds, other eligible secondary nodes hold an election to
choose a primary-acting node among them. The first secondary node that holds the election
and receives the majority of votes is elected as a primary node. If there is an arbiter node, its
vote is taken into consideration while choosing primary nodes.
Sharded clusters
MongoDB's sharding consists of the following components:
● Shard: Each shard stores a subset of sharded data. Also, each shard can be deployed as a
replica set.
● Mongos: Mongos provide an interface between a client application and sharded cluster
to route the query.
● Config server: The configuration server stores the metadata and configuration settings
for the cluster. The MongoDB data is sharded at the collection level and distributed
across sharded clusters.
● Shard keys: To distribute documents in collections, MongoDB partitions the collection
using the shard key. MongoDB shards data into chunks. These chunks are distributed
across shards in sharded clusters.
Cassandra
Cassandra is an open source, distributed, non-relational, partitioned row store. Cassandra rows
are organized into tables and indexed by a key. It uses an append-only, log-based storage
engine. Data in Cassandra is distributed across multiple masterless nodes, with no single point
of failure. It is a top-level Apache project, and its development is currently overseen by the
Apache Software Foundation (ASF).
Each individual machine running Cassandra is known as a node. Nodes configured to work
together and support the same dataset are joined into a cluster (also called a ring). Cassandra
clusters can be further subdivided based on geographic location, by being assigned to a logical
13
data center (and potentially even further into logical racks.) Nodes within the same data center
share the same replication factor, or configuration, that tells Cassandra how many copies of a
piece of data to store on the nodes in that data center. Nodes within a cluster are kept informed
of each other's status by the Gossiper. All of these components work together to abstract
Cassandra's distributed nature from the end user.
Cassandra Features
Redis
REmote DIctionary Server (Redis) is an open source, key-value, single-threaded, in-memory
data store that is commonly referred to as a data structure server. It is capable of functioning as
a NoSQL database, key/value store, a cache layer, and a message broker (among other things).
Redis is known for its speed, as it can store complex data structures in memory, and serve them
to multiple users or applications.
Redis was primarily designed to serve as an in-memory cache, intended to support atomic
operations on a single server. It was written (in C) by Salvatore Sanfilippo, who used it to replace
the MySQL instance running at his start-up. Clustering options are available (as of Redis 3.0)
with the advent of Redis Cluster. It is important to note that in terms of distributed systems,
these two configurations do behave differently:
Key Features
● Performance
14
The underlying idea behind Redis is very straightforward: to read and write as much data
as possible in RAM. As the majority of the operations do not include disk or network I/O,
Redis is able to serve data very quickly.
● Publish/Subscribe
Better known as Pub/Sub, this allows users to be alerted to updates on channels. When
a message is published to a channel, Redis sends the message to all of the subscribers.
This functionality has many uses for multi-user or social websites, mainly around
notifications for chat messages, tweets, emails, and other events.
● Counters
While the main data type used in Redis is a string, string values can be incremented and
decremented if they are numeric. This can be useful for counting things like page views
and access totals. An operation performed on a counter is an atomic operation, and
(when complete) the new, current value is returned.
15
Unlike many data stores, Redis can easily support queue-like functionality. Updates and
deletes can be performed with minimal overhead, and it has data types available to
work with Last In First Out (LIFO) and First In First Out (FIFO) queuing scenarios. Redis
can also keep the number of items in a queue at the desired size, as well as provide
methods for adding an item to the top or to the bottom of a list structure.
● Sets
Sometimes applications may require the ability to keep track of not just the frequency or
log of events, but a list of unique items. For these scenarios, Redis has the set data
structure. Sets in Redis are essentially unique lists of items.
For those times when a set of unique items needs to be presented in a specific order,
Redis offers the sorted set structure. In this way, unique items are added to a set and
ordered by a score. Check the Using Redis section for an example of using a sorted set
from the redis-cli.
● Notifications
Sometimes an application may require notifications for certain events. Let's assume that
a message board is being developed, and a user needs to be notified if someone
responds to one of their posts. In this case, the Redis Publish/Subscribe functionality
could be used. Each user could be written to subscribe to their own channel, and then
certain events could trigger a publish to that channel.
● counters
● caching
Redis Antipatterns
● Security
Redis is designed to be run on a network that is secured behind an enterprise-grade
firewall. That being said, Redis does come with features designed to help you tighten up
security. It is important to remember that not using those security features, as well as
doing things like exposing Redis IP addresses to the internet is dangerous and can have
disastrous consequences.
16
○ Reuse connection objects; avoid constantly creating/destroying connections to
Redis
Network/firewall
Redis requires the following TCP port to be accessible:
17