Nosql and Data Scalability 2.0: Amazon Dynamodb
Nosql and Data Scalability 2.0: Amazon Dynamodb
210
CONTENTS
INTRODUCTION NoSQL is short for Not Only SQL, and refers to the fact that non-
relational data can benefit from multiple different query mechanisms.
This NoSQL Refcard provides an easy-to-understand and useful set
The Structured Query Language (SQL) of relational systems is also
of information about the range of NoSQL databases available today.
supported by many NoSQL databases. This is useful for access from
legacy software platforms, including Business Intelligence (BI) tools
SCALABLE DATA ARCHITECTURES
that do not support NoSQL databases natively.
Scalable data architectures have evolved to improve overall system
efficiency and reduce operational costs. Specific NoSQL databases
NOSQL DATABASES CLASSIFICATION
may have different topological requirements, but the general
There are four key types of NoSQL databases. The simplest are also
architecture is the same.
the fastest, so there is a trade off on functionality when using a key-
value store. The four types are below:
DATABASE
BRIEF DESCRIPTION PRODUCT EXAMPLES
CLASS
Learn More
3 NOSQL AND DATA SCALABLILITY 2.0
DATABASE computer. Instead you can use multiple, small computer servers
BRIEF DESCRIPTION PRODUCT EXAMPLES
CLASS or even better, scale out in a virtualized cloud infrastructure like
Amazon Web Services (AWS).
Stores hierarchical JSON data.
Some support XML and other
formats. Maps very well to MongoDB, MarkLogic,
Ive put together a few data points that illustrate the trade-offs. Ive
programming languages object CouchDB, Couchbase, included relational databases for comparison. Note these show relative
Document graphs. Most popular NoSQL ArangoDB, OrientDB, scores compared to each other, not absolute scores in real terms.
database option with developers. Microsoft CosmosDB, IBM
Typically paired with a search Cloudant, Amazon DynamoDB
KEY- TRIPLE/
engine to handle complex RELATIONAL COLUMNAR DOCUMENT
VALUE GRAPH
unstructured text.
Data model
Very simple structures in a directed Medium Low Medium High High
complexity
graph. Each piece of data is a triple
Subject, Predicate, and Object.
This technology underpins the Breadth of
data model Low Medium Medium High High
Semantic Web. Triple stores are Neo4j, GraphDB, Allegrograph, applicability
Triple or
used to store webs of information MarkLogic, OrientDB,
Graph
with semantic inferencing, while ArangoDB Ease of
graph stores are used for minimum schema Low Very high Medium High Very high
distance (e.g. route planning change
applications) and other graph
traversal problems. Highly
variable
Performance Medium Very high High Medium
Document/Triple: MarkLogic query
dependent
Document/Graph: OrientDB,
Supports two or more of the above Varies on
ArangoDB
Hybrid, or architecture
types of data. Most common
multi-
pairing is document and triple/ Document/Columnar: Scale out Sharded:
model High Low Low Low
graph stores. Microsoft CosmosDB cost Low,
Key-value/Document: Unsharded:
High
Amazon DynamoDB
Varies on
While all database types are in common use, document stores architecture
TCO for
are most often associated with NoSQL systems due to their very large
High Low Medium Medium
Sharded:
operational Medium
pervasiveness in web and mobile content handling applications. volumes
Unsharded:
High
IS NOSQL FOR YOU?
Does your app design... Figure 2: Complexity and TCO
Need to handle varying data structures (schema), or have
schema that you do not control? Document and key-value stores are most popular because of their
ease of use, flexibility, and applicability across many problem
Require high-speed throughput?
domainsat a reasonable TCO.
Need to handle high volumes of data?
Tip: Graph databases are excellent replacements for complex
Work well with weak data consistency, or need different
relational models because relationships between entities (or graph
consistency models at different times?
edges) are more efficient and better suited for high-performance
Benefit from direct object-database entity mapping? applications than using explicit joins and foreign-keys. This is
Is operational, and not batch (unlike Hadoop applications)? especially the case for computational complex graph traversal
algorithms such as minimum distance or sub graph comparison.
If you checked off four or more items from the list, then NoSQL is a
good fit for you. Tip: Many NoSQL vendors make over 50% of their revenue from
consultancy. Be sure to interrogate suppliers for full project
NOSQL TRADE OFFS consultancy costs for your final analysis of TCO. Consultancy rates
The total cost of ownership (TCO) is often lower for NoSQL up to USD $2000 per day are possible for some NoSQL databases.
databases than relational databases. This is thanks mainly to two NoSQL vendor-trained System Integration (SI) partners are a good
things. First, many NoSQL databases have an open source core. source of experienced yet reasonably priced consultancy.
Second, they scale out on commodity hardware i.e. a very large WHICH DATA MODEL TO USE?
dataset does not require a very powerful, and very expensive, single The flowchart in Figure 3 describes how to choose the most
appropriate database or store for the application. Microsoft CosmosDB on Microsoft Azure
NOSQL IN PRACTICE
This section will use Amazon DynamoDB to illustrate key key-
value store traits, including real-life use cases and architectures.
Document database use cases are also covered briefly using
DynamoDB, thanks to its storage of JSON values and secondary
Figure 3: Choosing the Right Data Store indexes, allowing record queries.
AMAZON DYNAMODB
HYBRID OR MULTI-MODEL DATABASES
DynamoDB is a key-value NoSQL database that supports eventual
Many NoSQL databases are moving toward supporting multiple
and strong consistency. It is a very simple to use service, and can
models. This means they may be key-value stores, that also
be run standalone on a laptop or in the cloud on Amazon Web
support the storage and querying of JSON documents such as
Services (AWS).
Amazon DynamoDB.
Other NoSQL databases are supporting both the document and There are many use cases for DynamoDB specifically, and key-value
graph or triple store models. Examples of these include MarkLogic stores in general:
Server, ArangoDB, and OrientDB. Service advertisements for web pages with sub-second
response time
Which you choose depends primarily on how you query the data,
as you can see in Figure 3. Start with what questions you are going Storing user preferences for a web site
to ask of your data, and then look at the most convenient storage Storing temporary session information, such as a shopping cart
model such as cells (with column families, perhaps) or more
hierarchical JSON documents. An example architecture for using DynamoDB as a ad serving
database can be found at https://media.amazonwebservices.com/
If in doubt, start with a simple database structure that also has
architecturecenter/AWS_ac_ra_adserving_06.pdf
support for secondary indexes. Amazon DynamoDB is a good
candidate here, as it stores simple JSON values natively in its key- DynamoDB in particular is useful for web application developers,
value store, but also provides secondary indexes to pull as it has a friendly API with wrappers for Node.js, Java, and other
back records and data summaries, much like more complex languages. It also stores and retrieves data in web app-friendly
document stores do. JSON format.
CLOUD DATABASES This data can be retrieved by a row or partition key like other
key-value stores. You can also add secondary indexes to support
Demand-based scaling is an attractive proposition for running
querying by different attributes. These indexes allow for more
NoSQL systems on the cloud; it maximizes the advantages of
sophisticated query mechanisms.
running the application on cloud-based providers like AWS,
Microsoft Azure, or Google Cloud. QUICK START GUIDE FOR DYNAMODB
Database-as-a-Service (DBaaS) offers turnkey managed This quick start guide is a modified version of Amazon DynamoDB
functionality, which delegates all operational responsibilities to on Node.js tutorial. The version presented below is a realistic
the provider. web application to search and retrieve movie information from
Hosted VM databases are provisioned on virtual images, DynamoDB and present it on a web page.
much like they would be on premises, and all operational
This is the fundamental functionality of any web application, and
responsibility belongs to the user. All NoSQL databases can be
should allow you to get up and running for your own apps very quiclly.
used in this way.
Some NoSQL databases are available as cloud-friendly turnkey RUNNING DYNAMODB LOCALLY
DBaaS. Some of these are listed below: Our first step is to download a copy of DynamoDB and run it
locally. There is a very simple tutorial to do this on Amazons
Amazon DynamoDB on AWS
website: http://docs.aws.amazon.com/amazondynamodb/latest/
gettingstartedguide/GettingStarted.Download.htm After a short while you will see running on port 3000.
You download the .tar.gz or.zip for your platform, unzip the files, Now open a browser to http://localhost:3000/
and then execute the service. This assumes you have Java
installed locally. You will see a welcome page, and two search forms. These
forms wont work yet as we need to configure your AWS access
Ive created a folder called nodejs-dynamodb-sample. You can for DynamoDB.
download a complete copy of this from my GitHub Page:
github.com/adamfowleruk/nodejs-dynamodb-sample CONFIGURING AWS SECURITY
In order to use DynamoDB, youll need to register for a free AWS
Click on Download Zip to get the full repository contents.
account, and generate an Access Key.
Within this file Ive created a folder called ext that Ive unpacked Register for an AWS account here: aws.amazon.com
the DynamoDB files in to. You should do this yourself, now.
Once registered and logged in, search for the IAM service and click on it.
Ive then created a shell script (Linux, Mac) and batch file
(Windows) that executed the below code: IAM is AWS Identity and Access Management service. You will need
to create a user in order to store data in S3 and, later on, access the
java -Djava.library.path=./DynamoDBLocal_lib -jar ./ext/ DynamoDB service on AWS (were using a local service for now on
DynamoDBLocal.jar -sharedDb inMemory
your own computer).
For convenience you can open up a command prompt and just Click on Create Individual IAM Users and then Manage Users. The
execute the run-dynamodb-local.sh or .bat file. page will look like this.
Note: You can find all the code here on my GitHub site. You will Now click on Add User. Use a logical username. For an example,
have to download DynamoDB yourself and unpack it in to the ext refer to this image.
folder before running those files.
Now click on Next: Permissions and click on Create Group like so.
CREATING YOUR WEB APPLICATION WITH NODE.JS This will open up a new window. Configure a new named group with
EXPRESS AmazonS3FullAccess and AmazonDynamoDBFullAccess policies.
First youll need to download the DynamoDB SDK for Node.js. This Click Create Group. You should see a summary like this.
tutorial assumes you have a working Node.js environment. If you
Return to the Create User window in your browser, and click Next:
dont, visit nodejs.org and download the latest version.
Review then Next: Complete.
First, ensure the Express module is installed on your system,
Here you will see your access key, along with a secret key. Click on
globally. This is not part of the GitHub download, so you must
Show then jot both the access key and secret key down somewhere
execute it yourself.
safe. Click on Done when you have finished.
npm install g express-generator
Create a key file for AWS access.
Download the sample app from GitHub, and unpack it. Now open a Create this file:
command prompt and move to this folder:
Linux users: ~/.aws/credentials
cd nodejs-dynamodb-sample
Windows users: C:\Users\USER_NAME\.aws\credentials
Now type: Now take the Access Key and Secret Key and in this file, add them
npm install
as below:
[default]
After a few minutes, all of your dependency files for this application aws_access_key_id = <YOUR_ACCESS_KEY_ID>
will be installed. aws_secret_access_key = <YOUR_SECRET_ACCESS_KEY>
You should see an output like this: Now visit loalhost:3000/ and type in 1985 and A View to a Kill.
(Correct capitalization is very important) in to the Get form. Click Get.
adamfowookwork2:nodejs-dynamodb-sample adamfowler$ node
MoviesCreateTable.js Note that only one movie is shown.
Created table. Table description JSON: {
TableDescription: { Now go back to the index page, and type in a year in the Search
AttributeDefinitions: [
{ form. Click Search.
AttributeName: year,
AttributeType: N Express uses Jade for web page templating. To see what is
},
happening, read the following files:
{
AttributeName: title,
AttributeType: S 1. The execution code for /movies in ./routes/movies.js
}
], 2. The display for the results in ./views/movies.jade
TableName: Movies,
KeySchema: [
{ Note that there are two routes configured in movies.js a GET route
AttributeName: year,
and a POST route. Each route does something slightly different. The
KeyType: HASH
}, first fetches a specific single movie, the second lists movies using
{ an indexed field.
AttributeName: title,
KeyType: RANGE
} From this basic example you can move on to create your own
], application. You could use DynamoDB to:
TableStatus: ACTIVE,
CreationDateTime: 2017-08-13T09:52:48.719Z,
Store User information and site preferences for your website
ProvisionedThroughput: {
LastIncreaseDateTime: 1970-01-01T00:00:00.000Z,
LastDecreaseDateTime: 1970-01-01T00:00:00.000Z, Store game data, high scores
NumberOfDecreasesToday: 0,
ReadCapacityUnits: 10, Store shopping cart or other temporary data
WriteCapacityUnits: 10
},
TableSizeBytes: 0,
Much, much more
ItemCount: 0,
TableArn:
For further details, read all the links in the Amazon DynamoDB for
arn:aws:dynamodb:ddblocal:000000000000:table/Movies
} Node.js documentation: docs.aws.amazon.com/amazondynamodb/
} latest/gettingstartedguide/GettingStarted.NodeJs.html
adamfowookwork2:nodejs-dynamodb-sample adamfowler$
By clicking on Movies, you can view the items within the table
on the Items table, access metrics from your application, and see
estimated monthly costs in the Capacity tab.
ADAM FOWLER Fowler is based in the UK and is the author of the books NoSQL for Dummies
and State of NoSQL 2016, and maintains an active blog on Data Management and NoSQL at
adamfowler.org. Adam also curates important NoSQL news on his twitter feed @adamfowleruk.
DZone communities deliver over 6 million pages each month to more than
3.3 million software developers, architects and decision makers. DZone
offers something for everyone, including news, tutorials, cheat sheets,
research guides, feature articles, source code and more.
DZONE, INC. REFCARDZ FEEDBACK
"DZone is a developer's dream," says PC Magazine. 150 PRESTON EXECUTIVE DR. WELCOME
[email protected]
CARY, NC 27513
SPONSORSHIP
Copyright 2017 DZone, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval
888.678.0399 OPPORTUNITIES
system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior
919.678.0300 [email protected]
written permission of the publisher.