SQL Server Whitepape 2
SQL Server Whitepape 2
Relational to NoSQL:
Getting Started From
SQL Server
Why the shift to NoSQL?
As enterprises modernize, teams have to build and maintain Couchbase has enabled hundreds of enterprises, growth
applications more rapidly and at greater scale. Applications companies, and startups to deploy NoSQL for better
must be resilient and available whether their clients are flexibility, fast performance, and affordable costs. The
web, mobile, or the Internet of Things (IoT). If any of these goal of this paper is to help you introduce NoSQL into
channels fail, customers go elsewhere. Today, enterprises your organization by highlighting lessons learned from
in every industry from travel, to technology, to retail and teams that successfully adopted NoSQL. We’ll explore key
services are leveraging NoSQL database technology for considerations and strategies for transitioning to NoSQL,
more agile development, reduced operational costs, and in particular, to a document database (Couchbase Server
scalable operations. For many, the use of NoSQL started with or Couchbase Capella™ DBaaS), with tips for moving from
a cache, proof of concept (POC), or small application, then SQL Server and other relational databases. Note, there are
expanded to targeted mission-critical applications. NoSQL use cases in which NoSQL is not a replacement for, but a
has now become a foundation for modern web, mobile, and complement to, existing infrastructure, facilitating the use of
IoT application development. polyglot persistence.
Some of the largest internet and enterprise companies are We’ll start with recommendations for identifying and
using NoSQL technology to deploy their mission-critical selecting the right application. Next, we’ll cover strategies
applications. Examples include: for modeling relational data as documents, how to access
them within your application, and how to migrate data
• Gannett publisher of USA Today and 90+ media
from a relational database. Finally, we’ll highlight the
properties, replaced relational database technology
basics of operating a NoSQL database in comparison to
with NoSQL to power its digital publishing platform
a relational database.
• Marriott deployed NoSQL to modernize its hotel
reservation system that supports $38 billion in
annual bookings
NoSQL has become a foundation for modern web,
• FHL Bank Topeka integrates NoSQL with SQL Server mobile, and IoT application development.
to speed up access to customer financial data for its
At Couchbase, we’ve enabled hundreds of enterprises,
770 member banks
as well as numerous growth companies and startups,
• Cars.com with over 30 million visits per month, replaced to deploy NoSQL for better agility, performance, and
SQL Server with NoSQL to store customer and vehicle data lower costs
WHITEPAPER 2
Top 5 reasons companies replace
SQL Server with a NoSQL database
If you’re running into limits with Microsoft SQL Server (or
other relational databases) – either in terms of rising costs
and complexity, or in scaling to meet your requirements –
this is the time to evaluate a NoSQL database.
WHITEPAPER 3
Why the shift to NoSQL?
Many enterprises have successfully introduced NoSQL by Some common examples of good use
identifying a single application or service to start with. It
cases for NoSQL:
could be a new one that is being developed, or an existing
• Product catalog service
application that’s being refactored. Examples include:
• Asset tracking service
• A high performance, highly available caching service
• Content management service
• A small, independent application (or microservice)
with a narrow scope • Application configuration service
• A global service that powers multiple applications • File or streaming metadata service
across multiple regions
WHITEPAPER 4
Modeling and migrating your data
Couchbase Server is a document database – data is stored in JSON document collections, instead of tables. While relational
databases rely on an explicit predefined schema to describe the structure of data, document databases do not – JSON
documents are self-describing. As such, every JSON document includes its own flexible schema, and it can be changed on
demand by changing the document itself.
It’s important to understand that JSON documents are not limited to primitive fields. They can include arrays and objects,
and they can be nested, just like applications. For this reason, there is no “impedance mismatch” between application
objects and JSON documents. No complex object-relational mapping (ORM) solution is required.
WHITEPAPER 5
Just as every row in a table requires a primary key, every document requires an object ID. Many applications rely on relational
databases to automatically generate the primary key (for example, with the IDENTITY columns in Microsoft SQL Server).
Document databases can use unique keys like UUID/GUID, but applications can also use natural keys where possible.
In a relational database, primary keys are defined per table. It’s not uncommon for the primary key of different rows in different
tables to have the same value. After all, a row is identified by its table and primary key. However, document databases do
not store documents in tables; in Couchbase Server, they’re stored in collections (which are then stored in scopes and then
buckets). You can store any type of document within a collection, but typically you will store similar types of documents within
the same collection (i.e., very similar to using a relational table).
Cluster
Bucket
Couchbase Server is a document
database – data is stored in JSON Scope
The benefit of using natural keys with a document database is that a document can be identified by an object ID, even if the
collection stores different types of documents.
For example, consider a single collection with blogs, authors, and comments stored in separate documents:
author::shane
author::shane::blogs
blog::nosql_fueled_hadoop
blog::nosql_fueled_hadoop::comments
These object IDs not only enable the bucket to store related documents; they’re also human readable, deterministic, and
semantic. In addition, an application can construct these keys easily to fetch or query using them. Even if you decided to put
these four documents into four separate buckets, these keys will still be beneficial.
A document can be modeled after a row (flat), or it can be modeled after related rows in multiple tables (nested). However,
documents should be modeled based on how applications interact with the data. While some documents may contain nested
data, others may reference it.
WHITEPAPER 6
Figure 4: Related vs. nested JSON documents
WHITEPAPER 7
Reference or nest related data?
There are two things to consider when deciding how to model related data:
If it’s a one-to-one or one-to-many relationship (a child has one parent), it may be better to store the related data as nested
objects. This approach results in a simple data model and reduces or eliminates the need to query multiple documents.
However, if it’s a many-to-one or many-to-many relationship (a child has multiple parents), it may be better to store the related
data as separate documents, which reduces or eliminates the need to maintain duplicate data.
If a majority of the reads are limited to parent data (e.g., first and last name), it may be better to model the children (e.g.,
addresses and accounts) as separate documents. This results in better performance, because the data can be read with a
single key-value operation instead of a query, and reduces bandwidth, because the amount of data being transferred is smaller.
However, if a majority of the reads include both parent and child data, it may be better to model the children as nested objects.
This approach results in great performance because the data can be read with a single key-value operation instead of a query.
If a majority of the writes are to the parent or child, but not both, it may be better to model the children as separate documents.
For example, if user profiles are created with a wizard – first add info, then add addresses, finally add accounts – or if a user can
update an address or account without updating their info. However, if a majority of writes are to parent and child (both) – for
example, there’s a single form to create or update a user – it may be better to model the children as nested objects.
• One-to-one or one-to-many? Nest. • Most reads are for parent and child together? Nest.
• Many-to-one or many-to-many? Don’t nest. • Most writes are for parent or child? Don’t nest.
• Most reads are for parent data? Don’t nest. • Most writes are for parent and child together? Nest.
Finally, it may be better to model children as separate documents to reduce document size and write contention. For example,
the number of reviews on a product may grow indefinitely. If they were embedded, the size of the product document could
become excessive, resulting in slower reads. Consider a blog and comments. When a blog is first published, there may be a
lot of readers posting comments. If the comments are embedded, many concurrent users will try to update the same blog
document at the same time, resulting in slower writes. A good compromise may be to store comments as separate threads –
a document for every top-level comment that embeds all replies.
Figure 5: Different documents of the same type can have different schema
WHITEPAPER 8
Performing a migration?
The easiest and fastest way to get started is to export your relational data to CSV files, and
NoSQL databases provide import them into Couchbase Server. This may not represent the final data model, but it will
data access via key-value enable you to start interacting with Couchbase Server right away. Couchbase Server includes
APIs, SQL++, full-text a command-line utility, cbimport, for importing data in CSV files.
search, and more.
$ cbimport csv -c couchbase://127.0.0.1 -u Administrator -p password -b
NOTE: Same object, default --scope collection-exp myscope.products -d file:///products.csv -g
different structure? %id% -t 4
As illustrated in figure 5,
it’s possible for the same
Understanding your access patterns
field or object to have
NoSQL databases provide data access via key-value APIs, SQL++ query APIs, or SDKs.
a different structure in
The key-value API provides the best performance – since it will often be direct from an in-
different documents.
memory cache. The query API or language provides the most power and flexibility – enabling
applications to sort, filter, transform, group, and combine documents. Queries to Couchbase are
done using SQL, just like in a relational database (with extensions for JSON, hence “SQL++”).
Key value
The key-value API can provide a great deal of data access without the need to perform queries. In the example below, once you
have the object ID of the user profile document, you can figure out what the object IDs of the address and account documents are.
WHITEPAPER 9
Query
Couchbase Server’s implementation
The query API or language, combined with proper indexing, can provide a great deal of power
of the SQL++ standard is called
and flexibility without sacrificing performance. Couchbase Server provides a SQL++ implementation
N1QL, which extends SQL to
called N1QL, which extends SQL to JSON documents. N1QL also has support for ANSI joins making it
JSON documents.
easier for developers to apply their SQL knowledge to develop applications within Couchbase.
While one of the benefits of storing related data as separate documents is the ability to read a subset of the data (e.g., shipping
address), the same thing can be accomplished with a query API or language when related data is nested. For example, to read
the billing address from a user profile document that stores all related data as nested objects.
WHITEPAPER 10
In addition, while one of the benefits of storing related data as nested objects is the ability to access all data with a single read,
the same thing can be accomplished with a query API or language when related data is stored as separate documents. For
example, to read the user profile and accounts and addresses when they are stored as separate documents.
The query language can be used to perform CRUD operations as an alternative to the key-value API. This enables applications
built on top of a relational database to migrate all data access by replacing SQL statements with SQL++ statements. One of
the advantages of performing CRUD operations with SQL++ have the ability to perform partial updates:
N1QL abstracts the data model from the application model. Regardless of how data is modeled in the database, applications
can query it any way they need to by joining documents, nesting and unnesting them, and more. It provides developers with
the flexibility they need to model data one way and query it in many.
WHITEPAPER 11
Indexing your data
Query performance can be improved by indexing data. NoSQL databases support indexes to varying degrees
– Couchbase Serverincludes comprehensive indexing support. Below are some indexing examples.
QUERY
1 SELECT count(status)
2 FROM users
3 WHERE status = “Platinum”;
INDEX
QUERY
INDEX
QUERY
WHITEPAPER 12
A partial index on user billing state of users with a Visa credit card:
INDEX
QUERY
Couchbase Server supports index intersection. A query can scan multiple indexes in parallel. As a result, it may not be
necessary to create multiple indexes that include the same field, thereby reducing both disk and memory usage. You can
also index a large number of documents and horizontally scale out an index as needed. The system will transparently
partition the index across a number of index nodes using hash partitioning and will increase the performance and data
capacity of the cluster. For more index examples please see the documentation here.
WHITEPAPER 13
Connecting to the database
Applications access data in NoSQL databases via clients. Couchbase Server SDKs are all topology-aware clients (e.g., smart
clients) available in many languages: Java, Node.js, PHP, Python, C, and more. These clients are configured in much the same
way JDBC/ODBC drivers are configured.
A bucket is a higher-level abstraction than a connection, In addition, the cluster map enables operations teams to
and a cluster can contain multiple buckets. In the scale out the database without impacting the application.
example above, the application can access data in the Regardless of the number of nodes, the application sees a
users bucket. However, while key-value operations are single database. There are no application changes required
limited to the users bucket, SQL++ queries are not. to scale from a single node to dozens – the clients are
automatically updated.
Couchbase Server is a distributed database, but
applications do not have to pass in the IP address of In addition, with Couchbase Server, applications no longer
every node. However, they should provide more than have to rely on object-relational mapping frameworks for
one IP address so that if the first node is unavailable or data access, because there is no impedance mismatch
unreachable, they can try to connect to the next node. between the data model and the object model. In fact,
After the client connects to a node, it will retrieve the domain objects are optional. Applications can interact with
IP address of the remaining nodes. the data via document objects or by serializing domain
objects to and from JSON.
Couchbase Server clients also maintain a cluster map,
which enables them to communicate directly with nodes.
WHITEPAPER 14
Figure 7: working with domain objects vs document objects
NOTE: In addition to the clients, there are supported, developers can change the data model without having to
certified JDBC/ODBC database drivers available for change the application model. For example, you can add a
Transactions
Similar to a relational database, Couchbase offers ACID transaction support. Transactions must be initiated by the Couchbase
SDK, and can include key-value operations and/or SQL operations.
Transactions are not required when changing data in a single document structure as changes within a document are atomic. When
architecting new data structures this should be taken into account as atomic changes to a single document are lighter weight.
WHITEPAPER 15
try
{
await _transactions.RunAsync(async (ctx)=>
{
// 'ctx' is an AttemptContext, which permits getting, inserting,
// removing and replacing documents, along with committing and
// rolling back the transaction.
WHITEPAPER 16
Installing and scaling your database
One of the key advantages driving the adoption of NoSQL databases with a distributed architecture is their ability to scale
faster, easier, and at a significantly lower cost than relational databases. While most relational databases are capable of
clustering (e.g., Microsoft SQL Server), they are still limited to scaling up – failover clustering relies on shared storage while
Always-on availability groups are limited to replication. As a result, more data requires a bigger disk, and more users require
a bigger server. The shared storage not only becomes a bottleneck, it becomes a single point of failure. In contrast, most
NoSQL databases are distributed to scale out – more data requires more disks, not a bigger one, and more users require
more servers, not a bigger one.
SCALESCALE
UP UP SCALESCALE
OUT OUT
APPLICATION(S)APPLICATION(S)
APPLICATION(S) APPLICATION(S) APPLICATION(S)
APPLICATION(S) APPLICATION(S)
APPLICATION(S)
LOCAL LOCAL
STORAGE
LOCAL LOCAL LOCAL LOCAL LOCAL LOCAL LOCAL LOCAL
STORAGE
STORAGE STORAGE STORAGE STORAGE STORAGE STORAGE STORAGE STORAGE
LOCAL LOCAL
STORAGE STORAGE LOCAL LOCAL
STORAGE STORAGE
Couchbase Server’s topology-aware clients and consistent hashing distribute data within a cluster automatically. Data can be
replicated to one or more nodes to provide high availability, also automatically.
L OCAL L OCAL
S TO S TO
RA G R A GLOCAL LOCAL
L OCA L L OCA L L OCA L L OCA L
ST ORA G E ST ORA G E ST ORA G E ST ORA G E
STORAGE STORAGE
WHITEPAPER 17
Installing Couchbase Server
Fully managed
• Automated setup, backups, upgrades, and ongoing
CRC32
Hashing Algorithm management to deliver an always-on service, reducing
your operational efforts.
vBucket2
vBucket4
vBucket6
vBucket5
vBucket7
vBucket3
vBucket1
Self-monitoring, self-healing
• Capella proactively monitors clusters 24/7 to locate,
assess, and resolve issues automatically.
WHITEPAPER 18
Monitoring and managing In addition to the administration console and APIs, Couch-
your deployment base Server includes a number of command line tools to
perform additional tasks such as, among others:
Couchbase Server and Couchbase Capella include
an integrated, comprehensive administration • cbbackupmgr – Full, cumulative, and incremental
console as well as REST and CLI APIs. backup and restore
• cbcollect_info and cbdstats – Gather node and cluster
While many relational and NoSQL databases require
diagnostics (280+ metrics)
separate administration tools, Couchbase Server includes an
• cbq – Run SQL++ queries from the command line
integrated, comprehensive administration console as well as
• cbimport/cbexport – Transfer data to and from JSON
REST and CLI APIs.
or CSV files
The administration console and the REST/CLI APIs enable
administrators to manage and monitor clusters, both small
and large, with minimal effort. Functionality enabled through
the Couchbase admin console includes:
• Cluster/node/bucket/views
Tasks
• Failover nodes
• Rebalance cluster
Configuration
• Auditing
• Audit
• Monitor
WHITEPAPER 19
Putting it all together:
How to conduct a successful proof of concepts (POC)
Now that you’re familiar with the key considerations and strategies for transitioning
Now that you’re familiar with from a relational database to a NoSQL database – how to select an application, how to
the key considerations and model and access the data, and how to deploy the database – you’re ready to start a
strategies for transitioning proof of concept.
from a relational database
Couchbase solutions engineers have helped, and continue to help, many enterprises
to a NoSQL database,
successfully introduce NoSQL, from planning all the way to post-production. We
you’re ready to start a
encourage everyone to start with a proof of concept.
proof of concept.
There are five steps to a successful proof of concept:
3. Understand the data could work better – use this knowledge to help define the
Before defining the data model, simply understand the final application architecture.
WHITEPAPER 20
NoSQL success offers
rich rewards
If you’re ready to take the next steps and are looking for more
specific advice, we invite you to talk with one of our solutions
engineers. At a minimum, you’ll probably get some helpful
insights and best practices for your particular use case.
WHITEPAPER 21
At Couchbase, we believe data is at the heart of the enterprise. We empower developers and
architects to build, deploy, and run their mission-critical applications. Couchbase delivers a high-
performance, flexible and scalable modern database that runs across the data center and any
cloud. Many of the world’s largest enterprises rely on Couchbase to power the core applications