2022 - 11 - 01 - MongoDB Top 7 NoSQL Considerations
2022 - 11 - 01 - MongoDB Top 7 NoSQL Considerations
When Evaluating
NoSQL Databases
JULY 2021
Top 7 Considerations When
Evaluating NoSQL Databases
Table of Contents
Introduction 3
Data Model 4
Document Model 4
Graph Model 5
Key-Value Databases and Wide-Column Models 5
Query Model 7
Document Database 7
Graph Database 8
Key-Value Databases and Wide-Column Models 8
Consistent Systems 10
Eventually Consistent Systems 11
APIs 12
Idiomatic Drivers 12
RESTful APIs 12
SQL-Like APIs 13
Visualization and Reporting 13
Mobile Data 14
Schema Flexibility 14
Edge-to-Cloud Synchronization 15
Data Platform 16
Final Considerations 18
Commercial Support 18
Community Strength 18
Freedom From Lock-In 19
Why MongoDB? 20
Resources 20
2
Top 7 Considerations When
Evaluating NoSQL Databases
Introduction
Data and software are at the heart of business today. But for many organizations, realizing the full
potential of the digital economy remains a significant challenge. Since the inception of MongoDB,
we’ve understood that the biggest challenge organizations face is working with data:
° Demands for higher productivity and faster time to market are being held back by
rigid relational data models that are mismatched to modern code and impose complex
interdependencies among engineering teams.
° Organizations are unable to work with, and extract insights from, massive increases in the
new and rapidly changing structured, semi-structured, and polymorphic data generated by
today’s applications.
° Monolithic and fragile legacy databases inhibit the wholesale shift to distributed systems
and cloud computing that deliver the resilience and scale businesses need, making it harder
to satisfy new regulatory requirements for data privacy.
° Previously separate transactional, analytical, search, and mobile workloads are converging
to create rich, data-driven applications and customer experiences. However, each workload
traditionally has been powered by its own database, creating duplicated data silos stitched
together with fragile ETL pipelines accessed by different developer APIs.
To address these limitations, several non-tabular alternatives to relational databases have entered
the consideration set. Generally referred to as NoSQL databases, these systems discard the very
foundation that has made relational databases so useful for generations of applications: expressive
query language, secondary indexes, and strong consistency. NoSQL databases share several key
characteristics, including a more flexible data model, higher scalability, and superior performance.
Although the term NoSQL often is used as an umbrella category for all non-tabular databases, it’s
too vague and poorly defined to be a useful descriptor of the underlying data model. Primarily, it
neglects the trade-offs NoSQL databases make to achieve flexibility, scalability, and performance.
To help technology decision-makers navigate the complex and rapidly evolving domain of NoSQL
and non-tabular databases, we’ve highlighted the key differences between them in this white paper.
We also explore critical considerations based on seven dimensions that define these systems: data
model; query model; consistency and transactional model; APIs; mobile data; data platform; and
commercial support, community strength, and freedom from lock-in.
3
Top 7 Considerations When
Evaluating NoSQL Databases
Data Model
The primary way in which non-tabular databases differ from relational databases is the data model.
Although there are dozens of non-tabular databases, they generally fall into three categories:
document databases, graph databases, and key-value databases or wide-column stores.
Document Model
Whereas relational databases store data in rows and columns, document databases store data
in documents by using JavaScript Object Notation (JSON), a text-based data interchange format
popular among developers. Documents provide an intuitive and natural way to model data that
is closely aligned with object-oriented programming — each document is effectively an object
that matches the objects developers work with in code. Documents contain one or more fields,
and each field contains a typed value such as a string, date, binary, decimal value, or array.
Rather than spreading out a record across multiple columns and tables connected with foreign
keys, each record is stored along with its associated (i.e., related) data in a single, hierarchical
document. This model accelerates developer productivity, simplifies data access, and, in many
cases, eliminates the need for expensive join operations and complex abstraction layers such as
object relational mapping (ORM).
The schema of a relational database is defined by tables; in a document database, the notion
of a schema is dynamic — each document can contain different fields. This flexibility can be
particularly helpful for modeling data where structures can change between each record — i.e.,
polymorphic data. It also makes it easier to evolve an application during its life cycle, such as
by adding new fields. Additionally, some document databases provide the query expressivity
developers have come to expect from relational databases. In particular, data can be queried
based on any combination of fields in a document, with rich secondary indexes providing efficient
access paths to support almost any query pattern. Some document databases also offer the
option to enforce a schema on documents.
4
Top 7 Considerations When
Evaluating NoSQL Databases
Graph Model
Graph databases use graph structures with nodes, edges, and properties to represent data. In
essence, data is modeled as a network of relationships among specific elements. Although the
graph model may be counterintuitive, it can be useful for a specific class of queries. Its main appeal
is that it makes it easier to model and navigate relationships among entities in an application.
5
Top 7 Considerations When
Evaluating NoSQL Databases
Takeaways
° The key-value and wide-column data models are opaque in the system — only the primary
key can be queried.
° The document data model has the broadest applicability.
° The document data model is the most natural and productive because it maps directly to
objects in modern object-oriented languages.
° The wide-column model provides more granular access to data than the key-value model but
is less flexible than than the document model.
6
Top 7 Considerations When
Evaluating NoSQL Databases
Query Model
Each application has its own query requirements. In some cases, a basic query model may be
appropriate, where the application accesses records based only on a primary key. For most
applications, however, it’s important to have the ability to query based on several different values in
each record. For example, an application that stores data about customers may need to query by
customer name, company name, size, sales value, zip code, state, or aggregations of multiple values.
It’s also common for applications to update records, including one or more individual fields. To
satisfy these requirements, the database needs to be able to perform queries based on secondary
indexes. In these cases, a document database often will be the most appropriate solution.
Document Database
Document databases generally provide the ability to query and update any field within a document,
although capabilities in this domain vary. Some products, such as MongoDB, provide a rich set of
indexing options to optimize a wide variety of queries and to automate data management, including
text, geospatial, compound, sparse, wildcard, time to live (TTL), unique indexes, and others.
Furthermore, some of these products enable real-time analytics against data in place without
having to replicate it to a dedicated analytics application or search engine. MongoDB, for instance,
provides an aggregation framework for developers to create processing pipelines for data analytics
and transformations via faceted search, joins, and unions; geospatial processing; materialized views;
and graph traversals. It also provides native visualization capabilities with MongoDB Charts, along
with connectors for Apache Spark and BI tools. To update data, MongoDB provides expressive
update methods that enable developers to perform complex manipulations against matching
elements of a document — including elements embedded in nested arrays — all in a single
transactional update operation. There’s also MongoDB Atlas Search, which enables full-text search
capabilities on top of your data in the cloud.
7
Top 7 Considerations When
Evaluating NoSQL Databases
Graph Database
Graph databases provide rich query models in which simple and complex relationships can
be interrogated to make direct and indirect inferences about the data in the system. Although
relationship analysis tends to be efficient, other types of analysis are less optimal. As a result, graph
databases rarely are used for general-purpose operational applications. Rather, they’re often coupled
with document or relational databases to surface graph-specific data structures and queries.
For use cases involving multiple storage technologies, there’s an option to employ “multimodel”
databases in which different data models and query types are available within a single platform.
For example, MongoDB offers the $graphLookup aggregation stage for graph processing
natively within the database. $graphLookup enables efficient traversals across graphs, trees, and
hierarchical data to uncover patterns and surface previously unidentified connections.
Takeaways
° The biggest difference between non-tabular databases lies in the ability to query
data efficiently.
° Document databases provide the richest query functionality, which allows them to address a
wide variety of operational and real-time analytics applications.
° Key-value databases and wide-column stores provide a single means of accessing data:
primary keys. Although fast, they offer limited query functionality and may impose
additional development costs and application-level requirements to support more complex
query patterns.
8
Top 7 Considerations When
Evaluating NoSQL Databases
Consistency and
Transactional Model
Most non-tabular systems maintain multiple copies of data for availability and scalability purposes.
These databases can impose different guarantees on the consistency of data across copies.
Non-tabular databases tend to be categorized as either strongly consistent or eventually consistent.
With a strongly consistent system, writes by the application are immediately visible in subsequent
queries. With an eventually consistent system, the visibility of writes depends on which data replica
is serving the query. For example, when reflecting inventory levels for products in a product catalog,
with a consistent system each query will see the current inventory as it’s updated by the application,
whereas with an eventually consistent system, the inventory levels may not be accurate for a query
at a given time but will eventually become accurate as data is replicated across all nodes in the
database cluster. For this reason, application code tends to be somewhat different for eventually
consistent systems — rather than updating the inventory by taking the current inventory and
subtracting one, for example, developers are encouraged to issue idempotent queries that explicitly
set the inventory level. Developers also need to build additional control logic in their apps to handle
potentially stale or deleted data.
Most non-tabular systems offer atomicity guarantees at the level of an individual record. Atomicity is
one of four transaction properties that constitute ACID transactions. The four properties in an ACID
transaction are:
° Atomicity
° Consistency
° Isolation
° Durability
The point of ACID transactions is to guarantee data validity despite errors, power failures, and other
mishaps. Atomicity is an assurance that database operations are indivisible or irreducible such that
either all operations complete or none complete. Because these databases can combine related
data that otherwise would be modeled across separate parent-child tables in a tabular schema,
atomic single-record operations provide transaction semantics that meet the data integrity needs of
the majority of applications.
9
Top 7 Considerations When
Evaluating NoSQL Databases
It’s important to note that some developers and database administrators have been conditioned
by 40 years of relational data modeling to assume multirecord transactions are a requirement
for any database, regardless of the underlying data model. Some are concerned that although
multidocument transactions aren’t needed by their apps today, they might be in the future. And for
some workloads, support for ACID transactions across multiple records is required.
MongoDB added support for multidocument ACID transactions in 2018 so developers could
address a wider range of use cases with the familiarity of how transactions are handled in relational
databases. Through snapshot isolation, transactions provide a consistent view of data and enforce
all-or-nothing execution. MongoDB is relatively unique in offering the transactional guarantees of
traditional relational databases, with the flexibility and scale that come from non-tabular databases.
Consistent Systems
Applications can have different requirements for data consistency. For many applications, it’s
imperative for data to be consistent at all times. Because development teams have worked under
a model of consistency with relational databases for decades, this approach is more natural and
familiar. In other cases, eventual consistency is an acceptable trade-off for the flexibility it allows in
the system’s availability.
Document and graph databases can be consistent or eventually consistent. MongoDB provides
tunable consistency. By default, data is consistent — all writes and reads access the primary copy
of the data. As an option, read queries can be issued against secondary copies where data may be
eventually consistent if the write operation has not yet been synchronized with the secondary copy;
the consistency choice is made at the query level.
10
Top 7 Considerations When
Evaluating NoSQL Databases
Takeaways
° Different consistency models pose different trade-offs for applications in the areas of
consistency, availability, and performance.
° MongoDB provides tunable consistency, defined at the query level.
° Eventually consistent systems provide some advantages for inserts at the cost of making
reads, updates, and deletes more complex, while incurring performance overhead via read
repairs and compactions.
° Most non-tabular databases provide single-record atomicity. This is sufficient for many
applications but not all. MongoDB provides multidocument ACID guarantees, making it
easier to address a range of use cases with a single data platform.
11
Top 7 Considerations When
Evaluating NoSQL Databases
APIs
There is no standard for interfacing with non-tabular systems. Each system presents different
designs and capabilities to application developers. The maturity of the API can affect the time and
cost required for developing and maintaining the application and database.
Idiomatic Drivers
Programming languages provide different paradigms for working with data and services. Idiomatic
drivers are created by development teams that are experts in a given language and know how
programmers prefer to work within a language. This approach can also provide efficiencies for
accessing and processing data by leveraging specific features in a programming language.
Because idiomatic drivers are easier for developers to learn and use, they reduce the onboarding
time required for teams to begin working with a database. For example, idiomatic drivers provide
direct interfaces to set and get documents or fields within documents. With other types of
interfaces, it may be necessary to retrieve and parse entire documents and navigate to specific
values in order to set or get a field.
MongoDB supports idiomatic drivers in more than a dozen languages including Java, .NET, Ruby,
Node.js, Python, PHP, C, C++, C#, JavaScript, Go, Rust, and Scala. Dozens of other drivers are
supported by the developer community.
RESTful APIs
Some systems provide representational state transfer (RESTful) interfaces. This approach has the
appeal of simplicity and familiarity, but it relies on the inherent latencies associated with HTTP. It
also shifts the burden of building an interface to the developers. Note that this interface is likely to
be inconsistent with other programming interfaces.
12
Top 7 Considerations When
Evaluating NoSQL Databases
SQL-Like APIs
Some non-relational databases have attempted to add an SQL-like access layer to the database in
the hope that this will reduce the learning curve for developers and DBAs already skilled in SQL. It is
important to evaluate these implementations before serious development begins. Consider the following:
° Most of these implementations fall short compared to the power and expressivity of SQL
and will demand that SQL users learn a feature-limited dialect of the language.
° SQL-based BI, reporting, and ETL tools will not be compatible with a custom
SQL implementation.
° Although some of the syntax may be familiar to SQL developers, data modeling will not
be. Trying to impose a relational model on any non-tabular database will have adverse
consequences for performance and application maintenance.
Takeaways
° The maturity and functionality of APIs vary significantly across non-relational products.
° MongoDB’s idiomatic drivers minimize onboarding time for new developers and simplify
application development.
° Carefully evaluate the SQL-like APIs offered by non-relational databases to ensure they can
meet the needs of applications and developers.
13
Top 7 Considerations When
Evaluating NoSQL Databases
Mobile Data
The performance of mobile applications is just as important as the performance of server-based
architectures. But mobile apps introduce the added challenge of not always being connected to
the network. Application developers need a solution for keeping all of their customers’ apps in
sync with the back-end database, no matter where they are in the world and what kind of network
connection they have. The solution also needs to scale easily and quickly as more users download
an app, and to support the cutting edge of mobile development technologies as they evolve.
NoSQL databases — which are engineered to scale out on demand by leveraging less expensive
commodity hardware or cloud infrastructure — are ideally suited to the extra demands placed on
the back end by mobile applications that sync to it.
Schema Flexibility
Relational databases limit development because of their fixed schemas. Because new features are
always being added in mobile apps, making changes in relational databases for new situational
relationships becomes increasingly time-consuming. Mobile applications also present more use
cases than relational databases are designed to handle, including device type, operating system,
firmware, and location. For NoSQL databases, adding features or updating objects to account for
new use cases is simply a matter of entering new lines of code. NoSQL databases also are ideal
for handling frequent application updates that are a continual part of the app development life
cycle. There’s no need to overhaul the logic just to fix a bug. And making changes in one part of the
database is not likely to affect other parts of the application.
MongoDB Realm is a mobile database that allows customers to store data on mobile phones or
on devices at the edge. Its data model is similar to MongoDB in that it allows mobile developers to
store datalike objects in their code. The Realm Mobile Database also works cross-platform, which
means users do not have to rely on one technology for Android and another for iOS.
14
Top 7 Considerations When
Evaluating NoSQL Databases
Edge-to-Cloud Synchronization
MongoDB Realm Sync allows customers to keep data in sync between mobile devices and the
back-end database, despite intermittent connectivity. All data changes are stored in a unified history
and are automatically merged via timestamps and operational transformation. This edge-to-cloud
synchronization service comes with built-in conflict resolution and solves one of the most difficult
obstacles to building great mobile experiences. Data is always eventually consistent, and app
reliability is guaranteed. When end users don’t have a connection, they can still use their apps
because they have their data stored locally on the Realm Mobile Database. When their connection
is reestablished, their data on the local device is refreshed with updates from the back-end
database in the cloud and vice versa.
Takeaways
° The same flexible data model, higher scalability, and superior performance found in
NoSQL databases for server environments make NoSQL an ideal solution for mobile
applications and data.
° NoSQL databases are engineered to scale out on demand by leveraging less expensive
commodity hardware or cloud infrastructure.
° The lack of rigid relational schemas makes NoSQL development more agile and better
equipped to add new features, update apps, and fix bugs without having to overhaul the
entire database.
° Realm is a NoSQL database with a natively built data synchronization service that keeps
data up to date, whenever devices are online.
15
Top 7 Considerations When
Evaluating NoSQL Databases
Data Platform
Relational databases have a long and successful history of running with proprietary software and
hardware as part of an on-premises ecosystem of applications, servers, and endpoints. But modern
infrastructure has moved to the cloud. Server workloads are widely distributed across multi-cloud
architectures that continually expand the edge of the network far beyond the confines of traditional
on-premises environments. Widely distributed workloads place high demands on databases that
must fulfill their role as the single source of truth, where truth is measured in microseconds. The
simplicity of NoSQL databases makes them better suited for the velocity and volume of modern
data transactions. And their portability enables organizations to transform the traditional centralized
data repository into a highly flexible and responsive data platform capable of distributing workloads
closer to where applications need them.
As data privacy regulations expand to include data sovereignty requirements, and local application
servers require the most relevant data to be close by to ensure low-latency reads and writes,
organizations need more control over where they deploy their data. MongoDB Atlas is a fully
managed cloud database that gives organizations this level of control over where they deploy their
data, whether for regulatory or for performance purposes.
16
Top 7 Considerations When
Evaluating NoSQL Databases
Atlas enables fully integrated full-text search, eliminating the need for a separate search engine.
Flexible local datastore offers seamless edge-to-cloud sync for mobile and IoT devices. You can
perform in-place, real-time analytics with workload isolation and native data visualization. You can
also run federated queries across operational or transactional databases and cloud object storage.
And it allows global data distribution for data sovereignty and faster access to data because it
resides closer to where it’s being used.
Database as a Service
A modern data platform enabled through a database-as-a-service capability, such as MongoDB
Atlas, gives developers the freedom and flexibility to work seamlessly with data wherever their
applications and users need it, and to build integrated search features on top of cloud data across
all the major public cloud platforms. Rather than rigid tabular schemas and complex relationships,
Atlas provides a fully elastic data infrastructure that can be updated as needed via idiomatic
drivers that developers are already familiar with. This allows developers more time to focus on
their applications rather than managing databases themselves. MongoDB Atlas also offers
industry-leading data-privacy controls with client-side field-level encryption, and it enables you to
deploy workloads across clouds in nearly 80 regions.
Takeaways
° Modern multi-cloud environments require flexibility, speed, and elasticity not found in
relational databases with tabular schemas.
° Rigid tabular structures lead to the Data and Innovation Recurring Tax (DIRT).
° Distributed databases deployed in the cloud to the edge of the network give organizations
the ability to create a resilient, high-availability data platform that puts data closer to the
applications that need it.
° Database-as-a-service capabilities allow developers to spend less time managing databases
and more time building applications and rich query experiences.
17
Top 7 Considerations When
Evaluating NoSQL Databases
Final Considerations
A database is a major investment. Once an application has been built on a given database, it is
costly, challenging, and risky to migrate it to a different database. Companies usually invest in a
small number of core technologies so they can develop expertise, integrations, and best practices
that can be amortized across many projects. Non-tabular databases are still a relatively emergent
technology. Although there are many new options in the market, only a subset of technologies and
vendors will stand the test of time.
Commercial Support
Consider the health of the vendor or product when evaluating a database. It is important not only
that the product continues to exist, but also that it evolves and adds new features as the needs
of users dictate. Having a strong, experienced support organization capable of providing services
globally is another relevant consideration.
Community Strength
There are significant advantages to having a strong community around a technology, particularly
databases. A database with a strong community of users makes it easier to find and hire developers
who are familiar with the product. It makes it easier to find best practices, documentation, and code
samples, all of which reduce risk in new projects. It also helps organizations retain key technical
talent. Finally, a strong community encourages other technology vendors to develop integrations
and participate in the ecosystem.
18
Top 7 Considerations When
Evaluating NoSQL Databases
Takeaways
° Community size and commercial strength are important for evaluating non-
relational databases.
° MongoDB is one of the very few non-relational database companies to be publicly traded,
it has the largest and most active community, its support teams spread across the
world provide 24/7 coverage, it boasts user groups in most major cities, and it provides
extensive documentation.
° MongoDB is available to run on your own infrastructure or as a fully managed cloud service
on all of the leading public cloud platforms.
19
Top 7 Considerations When
Evaluating NoSQL Databases
Why MongoDB?
As the technology landscape evolves, organizations increasingly find the need to evaluate new
databases to support changing application and business requirements. Considering the media hype
around non-tabular databases and the commensurate lack of clarity in the market, it’s important
to make clear distinctions between the available solutions when possible. As discussed in this
white paper, there are several key criteria to consider when evaluating these technologies. Many
organizations find that document databases such as MongoDB are best suited to meet these
criteria, though we encourage decision-makers to evaluate the considerations for themselves.
Resources
For more information, please visit MongoDB.com
20