Enter The Purpose-Built Database Era:: Finding The Right Database Type For The Right Job
Enter The Purpose-Built Database Era:: Finding The Right Database Type For The Right Job
Database Era:
Finding the right database
type for the right job
1
INTRODUCTION
IT leaders need to look for ways to get more value from their data. If you’re
running legacy databases on-premises, you’re likely finding that provisioning,
operating, scaling, and managing databases is tedious, time-consuming, and
expensive. You need modernized database solutions that allow you to spend
time innovating and building new applications—not managing infrastructure.
Moving on-premises data to managed databases built for the cloud can help
you reduce time and costs. Once your databases are in the cloud, you can
innovate and build new applications faster—all while getting deeper and
more valuable insights.
Migrating to the cloud is the first step toward entering the era of
purpose-built databases. But once in the cloud, how do you know which
types of databases to use for which functions? Read on to learn more about
purpose-built database types—and how you can ensure a smooth transition
into an era of innovation, performance, and business success.
2
WHY CHANGE?
Relational databases were designed for tabular data with consistent structure
and fixed schema. They work for problems that are well defined at the onset.
Traditional applications like ERP, CRM, and e-commerce need relational
databases to log transactions and store structured data, typically in GBs
and occasionally TBs.
While relational databases are still essential—in fact, they are still growing—a
“relational only” approach no longer works in today’s world.
With the rapid growth of data—not just in volume and velocity but also in
variety, complexity, and interconnectedness—the needs of databases have
changed. Many new applications that have social, mobile, IoT, and global access
requirements cannot function properly on a relational database alone.
These applications need databases that can store TBs to PBs of new types of
data, provide access to data with millisecond latency, process millions of requests
per second, and scale to support millions of users anywhere in the world.
In the following pages, we will examine a variety of database types, exploring the
strengths, challenges, and primary use cases of each.
3
At-a-glance
Quickly jump to info on different database types
Relational Graph
Provides high integrity, accuracy, and Create and traverse relationships
consistency, limitless indexing within highly connected data sets
Useful for ERP, CRM, finance, Useful for fraud detection, social
transactions, and data warehousing networking, data lineage, and
knowledge graphs
Key Value
Predictable low latency regardless Time Series
of scale, flexible schema, optional High scalability for data that
consistency accumulates quickly
4
RELATIONAL
Relational databases
Description
In relational database management systems (RDBMS), data is stored in a
tabular form of columns and rows and data is queried using the Structured
Query Language (SQL). Each column of a table represents an attribute, each
row in a table represents a record, and each field in a table represents a data
value. Relational databases are so popular because 1) SQL is easy to learn and
use without needing to know the underlying schema and 2) database entries
can be modified without specifying the entire body.
Patient
x Patient ID
First Name
Last Name
Gender Visit
DOB x Visit ID
x Doctor ID x Patient ID
x Hospital ID
Date
Doctor
x Treatment ID
x Doctor ID
First Name
Last Name Medical Treatment
Medical Specialty x Treatment ID
x Hospital Affiliation Procedure
How Performed
Adverse Outcome
Hospital
Contraindication
x Hospital ID
Name
Address
Rating
5
RELATIONAL
Advantages
Works well with structured data
Use cases
ERP apps
CRM
Finance
Transactions
Data warehousing
6
KEY VALUE
Primary Key
Attributes
Partition key: GameID PSort key: Podium
PlayerID
1. Gold
Hammer57
PlayerID
3. Bronze
x Jam22Jam
7
KEY VALUE
Advantages
Scaling decoupled from CPU load of any single node, resulting in consistent
low latency regardless of throughput requirements
Schema flexibility allows for sparse storage and direct developer ownership
Use cases
Real-time bidding
Shopping cart
Product catalog
Customer preferences
8
DOCUMENT
Document databases
Description
In document databases, data is stored in JSON-like documents and JSON
documents are first-class objects within the database. These databases
make it easier for developers to store and query data by using the same
document-model format developers use in their application code.
1 [
2 {
3 “year” : 2013,
4 “title” : “Turn It Down, Or Else!”,
5 "info" : {
6 “directors” : [ “Alice Smith”, “Bob Jones”],
7 “release_date” : “2013-01-18T00:00:00Z”,
8 “rating” : 6.2,
9 “genres” : [“Comedy”, “Drama”],
10 “image_url” : “http://ia.media-imdb.com/images/N/O9ERWAU7FS797AJ7LU8HN09AMUP908RLlo5JF90EWR7LJKQ7@@._V1_SX400_.jpg”,
11 “plot” : “A rock band plays their music at high volumes, annoying the neighbors.”,
9
DOCUMENT
Advantages
Flexible, semi-structured, and hierarchical
Flexible schema
Use Cases
Catalogs
Mobile
10
IN-MEMORY
In-memory databases
Description
With the rise of real-time applications, in-memory databases are growing in
popularity. In-memory databases predominantly rely on main memory for data
storage, management, and manipulation. In-memory has been popularized
by open-source software for memory caching, which can speed up dynamic
databases by caching data to decrease access latency, increase throughput, and
ease the load off the main databases.
Memory Memory
(buffer pool)
Database
Storage engine
Disk
11
IN-MEMORY
Advantages
Sub-millisecond latency
Use Cases
Caching
Session store
Leaderboards
Geospatial services
Pub/sub
Real-time streaming
12
GRAPH
Graph databases
Description
Graph databases are a type of NoSQL database designed to make it easy
to build and run applications that work with highly connected datasets. In a
graph data model, relationships are first class citizens, i.e. they are represented
directly. Using specialized graph languages, like SPARQL or Gremlin, allows you
to easily build queries that efficiently navigate highly connected datasets.
In RDF graphs, the concepts of Nodes, Edges, and Properties are represented as
Resources with Internationalized Resource Identifiers (IRIs)
PRODUCT
Bill PURCHASED PURCHASED Amit
KNOWS
SPORT
Sara FOLLOWS FOLLOWS Kevin
13
GRAPH
Advantages
Ability to make frequent schema changes
Use Cases
Fraud detection
Social networking
Recommendation engines
Knowledge graphs
Data lineage
14
TIME
TIME-SERIES
SERIES
Advantages
Ideal for measurements or events that are tracked, monitored,
and aggregated over time
Continuous queries
Documents
Catalogs
Customer profiles
Use Cases
DevOps
Application monitoring
Industrial telemetry
IoT applications
15
LEDGER
Ledger databases
Description
Ledger databases provide a transparent, immutable, and cryptographically
verifiable transaction log owned by a central trusted authority. Many
organizations build applications with ledger-like functionality because
they want to maintain an accurate history of their applications’ data—for
example, tracking the history of credits and debits in banking transactions,
verifying the data lineage of an insurance claim, or tracking movement of
an item in a supply chain network.
Advantages
Maintain accurate history of application data
Cryptographically verifiable
Highly scalable
Use Cases
Finance – Keep track of ledger data such as credits and debits
Retail and supply chain – Access info on every supply chain stage
16
WIDE COLUMN
Advantages
Scalable
Flexible
Use Cases
High scale industrial applications for:
Equipment maintenance
Fleet management
Route optimization
Data logs
Geographic data
17
PURPOSE-BUILT
Today’s developers need diverse data models that match a variety of use cases.
Finding the right tool for the right job can be challenging, but we hope this
document helps you simplify the process.
To get the most out of these different database types, however, you’ll need to
first migrate your data, databases, and applications to the cloud. And remember,
not all cloud providers are created equal. You’ll want a provider that offers the
performance, scale, and availability of commercial databases and also the cost-
effectiveness, freedom, and flexibility of open-source databases.
18
CASE STUDY
Airbnb also uses Amazon DynamoDB to store user search history, and Amazon
ElastiCache to store session state in memory for faster (sub-millisecond)
site rendering.
19
CASE STUDY
20
CONCLUSION
Go to the next page for a more detailed look at these AWS database solutions.
21
SOLUTIONS
Relational
Amazon Aurora: MySQL and PostgreSQL-compatible relational
database built for the cloud. Combines the performance and
availability of traditional enterprise databases with the simplicity
and cost-effectiveness of open source databases.
Key value
Amazon DynamoDB: Fully managed (serverless) key value database that
delivers single-digit millisecond performance at any scale. Multi-region,
multi-master database with built-in security, eventual and strong consistency
of reads, ACID-compliant for transactional operations across one or more
rows, backup and restore, and in-memory caching. Highest levels of
availability and scaling elasticity.
22
SOLUTIONS
Document
Amazon DocumentDB: Fast, scalable, highly available, and fully
managed document database service that supports MongoDB
workloads. Designed from the ground up for mission-critical
performance, scalability, and availability.
In-memory
Amazon ElastiCache for Redis: Blazing fast, fully managed in-memory
data store compatible with Redis. Provides sub-millisecond latency to
power internet-scale, real-time applications.
Graph
Amazon Neptune: Fast, reliable, fully managed graph database service
that makes it easy to build and run applications that work with highly
connected datasets.
Time Series
Amazon Timestream: Scalable, fully managed, fast time series
database service for IoT and operational applications. Enables
storage and analysis of trillions of events per day at 1/10th
the cost of relational databases.
Wide Column
Amazon Keyspaces (for Apache Cassandra): Scalable, highly available,
and managed Apache Cassandra–compatible database service that
allows you to use your existing Cassandra Query Language (CQL) code
and tools. Serverless solution for building apps that can serve thousands
of requests per second with virtually unlimited throughput and storage.
23
ABOUT AWS
For 14 years, Amazon Web Services has been the world’s most comprehensive and broadly
adopted cloud platform. AWS offers over 175 fully featured services for compute, storage,
databases, networking, analytics, robotics, machine learning (ML), and artificial intelligence
(AI), Internet of Things (IoT), mobile, security, hybrid, virtual and augmented reality (VR
and AR), media, and application development, deployment, and management from 73
Availability Zones (AZs) within 23 geographic regions, spanning the U.S., Australia, Brazil,
Canada, China, France, Germany, India, Ireland, Japan, Korea, Singapore, Sweden, and the
U.K. Millions of customers, including the fastest-growing startups, largest enterprises, and
leading government agencies, trust AWS to power their infrastructure, become more agile,
and lower costs. To learn more about AWS, visit aws.amazon.com.
AMAZON
©2020, WEB
Amazon SERVICES
Web | INTRODUCTION
Services, Inc. or its affiliates. AllTO AWS
rights ECONOMICS 24
reserved.