100% found this document useful (1 vote)
75 views

Nosql and Data Scalability: Getting Started With

Scalability is the ability of a system to store, manipulate, analyze, and otherwise process ever increasing amounts of data without reducing overall system availability, performance, or throughput. Relational and hierarchical databases scale up by adding more processors, more storage, caching systems, and such. MongoDB, GigaSpaces xAP, Google App Engine Datastore and more.

Uploaded by

RenZo Mesquita
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
75 views

Nosql and Data Scalability: Getting Started With

Scalability is the ability of a system to store, manipulate, analyze, and otherwise process ever increasing amounts of data without reducing overall system availability, performance, or throughput. Relational and hierarchical databases scale up by adding more processors, more storage, caching systems, and such. MongoDB, GigaSpaces xAP, Google App Engine Datastore and more.

Uploaded by

RenZo Mesquita
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

DZone, Inc. | www.dzone.

com
By Eugene Ciurana
INTRODUCTION
G
e
t
t
i
n
g

S
t
a
r
t
e
d

w
i
t
h

N
o
S
Q
L

a
n
d

D
a
t
a

S
c
a
l
a
b
i
l
i
t
y














w
w
w
.
d
z
o
n
e
.
c
o
m















G
e
t

M
o
r
e

R
e
f
c
a
r
d
z
!

V
i
s
i
t

r
e
f
c
a
r
d
z
.
c
o
m


#105
CONTENTS INCLUDE:
n
Introduction
n
Scalable Data Architecture
n
Is NoSQL For You?
n
mongoDB
n
GigaSpaces XAP
n
Google App Engine Datastore and more...
The DZone Refcard #43 is an introduction to system high
availability and scalability terminology and techniques
(http://refcardz.dzone.com/refcardz/scalability). The next
logical step is the scalable handling of massive data volumes
resulting from having these powerful processing capabilities.
This Refcard demysties NoSQL and data scalability
techniques by introducing some core concepts. It also offers
an overview of current technologies available in this domain
and suggests how to apply them.
What is Data Scalability?
Data scalability is the ability of a system to store, manipulate,
analyze, and otherwise process ever increasing amounts of
data without reducing overall system availability, performance,
or throughput.
Data scalability is achieved by a combination of more powerful
processing capabilities and larger but efcient storage
mechanisms.
Relational and hierarchical databases scale up by adding more
processors, more storage, caching systems, and such. Soon
they hit either a cost or a practical scalability limit because
they are difcult or impossible to scale out. These database
management systems are designed as single units that must
maintain data integrity, and enforce schema rules to guarantee
it. This rigidity is what promotes upward, but not outward,
scalability.
Get over 90 DZone Refcardz
FREE from Refcardz.com!
Getting Started with
NoSQL and Data Scalability
Hot
Tip
Oracle RAC is a cluster of multiple computers with
access to a common database. This is considered
only vertically scalable because the processing
(usually in the form of stored procedures) may scale
out, but the shared storage facilities dont scale with
the cluster.
Data integrity and schemas are suited for handling
transactional, normalized, uniform data. They handle
unstructured or rapidly evolving data structures with difculty
or exponentially larger costs.
Hot
Tip
Data replication is not the same as data scalability!
SCALABLE DATA ARCHITECTURES
There are two general kinds of architectures used for building
scalable data systems: data grids and NoSQL systems.
Node Node Node Node
Load Balancer
Node Node Node Node
Load Balancer
Consumer
Master
Figure 1 - Data Grid
Data grids expose their functionality through a single API
(either a Web service or native to the application programming
language) that abstracts its topology and implementation from
its data processing consumer.
Implementations of either often share characteristics from the
other.
Data Grids
Data grids process workloads dened as independent jobs
that dont require data sharing among processes. Storage
or network may be shared across all nodes of the grid, but
intermediate results have no bearing on other jobs progress or
on other nodes in the grid, such as a MapReduce cluster.
DZone, Inc. | www.dzone.com
2
Getting Started with NoSQL and Data Scalability
IS NoSQL FOR YOU?
Hot
Tip
NoSQL is intended as shorthand for not only
SQL. Complete architectures almost always mix
traditional and NoSQL databases.
NoSQL setups are best suited for non-OLTP applications that
process massive amounts of structured and unstructured data
at a lower cost and with higher efciency than RDBMs and
stored procedures.
Virtual File System
logical table management, load balancing, garbage collection
(HDFS, mongoFS, Hypertable)
Tablet
Server 0
Tablet
Server 1
Tablet
Server n
Distributed File System
FS 0 FS 1 FS 2 FS n
Node Node Node Node
Consumer
Figure 2 - NoSQL Topology
Areas of Application
Financial modeling
Data mining
Click stream analytics
Document clustering
Distributed sorting or grepping
Simulations
Inverted index construction
Protein folding
NoSQL
NoSQL describes a horizontally scalable, non-relational
database with built-in replication support. Applications
interact with it through a simple API, and the data is stored in a
schema-free, at addressing repository, usually as large les
or data blocks. The repository is often a custom le system
designed to support NoSQL operations.
NoSQL storage is highly replicated (a commit doesnt occur
until the data is successfully written to at least two separate
storage devices) and the le systems are optimized for write-
only commits. The storage devices are formatted to handle
large blocks (32 MB or more). Caching and buffering are
designed for high I/O throughput. The NoSQL database
is implemented as a data grid for processing (mapReduce,
queries, CRUD, etc.)
Areas of Application
Document storage
Object databases
Graph databases
Key/value stores
Eventually consistent key/value stores
This list is the implementation counterpart to the data grid
areas of application; the data store feeds the computational
network and, together, they form the NoSQL database.
NoSQL vs RDBMS vs OO Analogies
NoSQL RDBMS OO
Kind Table Class
Entity Record Object
Attribute Column Property
Preparation:
Dont fall prey to the NoSQL fad
Dont be stubborn; neither NoSQL nor traditional
databases apply to all cases
Apply the CAP Theorem to your use cases to
determine feasibility
Brewers (CAP) Theorem
Its impossible for a distributed computer system to
simultaneously provide all three of these guarantees:
Consistency (all nodes see the same data at the same time)
Availability (node failures dont prevent survivors from
continuing to operate)
Partition tolerance (no failures less than total network
failures cause the system to fail)
Since only two of these characteristics are guaranteed for any
given scalable system, use your functional specication and
business SLA (service level agreement) to determine what your
minimum and target goals for CAP are, pick the two that meet
your requirements, and proceed to implement the appropriate
technology.
Hot
Tip
Rule of Thumb: NoSQLs primary goal is to achieve
horizontal scalability. It attains this by reducing
transactional semantics and referential integrity.
Use Figure 3 to identify the best match between your
applications CAP requirements and the suggested SQL and
NoSQL systems listed.
Pick Any Two
C A
P
Consistency Availability
Partition tolerance
Relational
Key-Value
Column-Oriented
Document-Oriented
RDBMs (Oracle, MySQL), Aster Data, Green Plum, Vertica
D
y
n
a
m
o
,

V
o
l
d
e
m
o
r
t
,

T
o
k
y
o

C
a
b
i
n
e
t
,

K
A
I
,

C
a
s
s
a
n
d
r
a
,
S
i
m
p
l
e
D
B
,

C
o
u
c
h
D
B
,

R
i
a
k
m
o
n
g
o
D
B
,

T
e
r
r
a
s
t
o
r
e
,

D
a
t
a
s
t
o
r
e
,

H
y
p
e
r
t
a
b
l
e
,

H
b
a
s
e
,
R
e
d
i
s
,

B
e
r
k
e
l
e
y

D
B
,

M
e
m
c
a
c
h
e
D
B
,

S
c
a
l
a
r
i
s
Figure 3 - CAP Selection Chart
(source: Nathan Hurst's Blog)
DZone, Inc. | www.dzone.com
3
Getting Started with NoSQL and Data Scalability
mongoDB
mongoDB is a document-based NoSQL database that bridges
the gap between scalable key-value stores like Datastore
and Memcache DB, and RDBMSs querying and robustness
capabilities. Some of its main features include:
Document-oriented storage - data is manipulated as
JSON-like documents
Querying - uses JavaScript and has APIs for submitting
queries in every major programming language
In-place updates - atomicity
Indexing - any attribute in a document may be used for
indexing and query optimization
Auto-sharding - enables horizontal scalability
Map/reduce - the mongoDB cluster may run smaller
MapReduce jobs than a Hadoop cluster with signicant
cost and efciency improvements
mongoDB implements its own le system to optimize I/O
throughput by dividing larger objects into smaller chunks.
Documents are stored in two separate collections: les
containing the object meta-data, and chunks that form a
larger document when combined with database accounting
information. The mongoDB API provides functions for
manipulating les, chunks, and indices directly. The
administration tools enable GridFS maintenance.
mongoDB Server (master)
Data
Storage
mongod
Database
daemon
mongos
Sharding
daemon
mongoDB Server (slave)
Data
Storage
mongod
Database
daemon
mongos
Sharding
daemon
Consumer
fail-over
Figure 5 - mongoDB Cluster
A mongoDB cluster consists of a master and a slave. The slave
may become the master in a fail-over scenario, if necessary.
Having a master/slave conguration (also known as Active/
Passive or A/P cluster) helps ensure data integrity since only
the master is allowed to commit changes to the store at any
given time. A commit is successful only if the data is written to
GridFS and replicated in the slave.
Hot
Tip
mongoDB also supports a limited master/master
conguration. Its useful only for inserts, queries,
and deletions by specic object ID. It must not
be used if updates of a single object may occur
concurrently.
Caching
mongoDB has a built-in cache that runs directly in the cluster
without external requirements. Any query is transparently
cached in RAM to expedite data transfer rates and to reduce
disk I/O.
Document Format
mongoDB handles documents in BSON format, a binary-
encoded JSON representation. BSON is designed to be
traversable, lightweight and efcient. Applications can map
BSON/JSON documents using native representations like
dictionaries, lists, and arrays, leaving the BSON translation
to the native mongoDB driver specic to each programming
language.
BSON Example
{
'name' : 'Tom',
'age' : 42
}
Language Representation
Python {
'name' : 'Tom',
'age' : 42
}
Ruby {
"name" => "Tom",
"age" => 42
}
Java BasicDBObject d;
d = new BasicObject();
d.put("name", "Tom");
d.put("age", 42);
PHP array( "name" => "Tom",
"age" => 42);
Dynamic languages offer a closer object mapping to BSON/
JSON than compiled languages.
The complete BSON specication is available from:
http://bsonspec.org/
mongoDB Programming
Programming in mongoDB requires an active server running
the mongod and the mongos database daemons (see Figure
5), and a client application that uses one of the language-
specic drivers.
Hot
Tip
All the examples in this Refcard are written in Python
for conciseness.
Starting the Server
Log on to the master server and execute:
[servername:user] ./mongod
The server will display its status messages to the console
unless stdout is redirected elsewhere.
Programming Example
This example allocates a database if one doesnt already exist,
instantiates a collection on the server, and runs a couple of
queries.
The mongoDB Developer Manual is available from:
http://www.mongodb.org/display/DOCS/Manual
#!/usr/bin/env jython
import pymongo
from pymongo import Connection
connection = Connection('servername', 27017)
db = connection['people_database']
peopleList = db['people_list']
person = {
'name' : 'Tom',
'age' : 42 }
DZone, Inc. | www.dzone.com
4
Getting Started with NoSQL and Data Scalability
peopleList.insert(person)
person = {
'name' : 'Nancy',
'age' : 69 }
peopleList.insert(person)
# fnd frst entry:
person = peopleList.fnd_one()
# fnd a specifc person:
person = peopleList.fnd_one({ 'name' : 'Joe'})
if person is None:
print "Joe isnt here!"
else:
print person['age']
# bulk inserts
persons = [{ 'name' : 'Joe' }, {'name' : 'Sue'}]
peopleList.insert(persons)
# queries with multiple results
for person in peopleList.fnd():
print person['name']
for person in peopleList.fnd({'age' : {'$ge' : 21}}).sort('name'):
print person['name']
# count:
nDrinkingAge = peopleList.fnd({'age' : {'$ge' : 21}}).count()
# indexing
from pymongo import ASCENDING, DESCENDING
peopleList.create_index([('age', DESCENDING), ('name', ASCENDING)])
The PyMongo documentation is available at:
http://api.mongodb.org/python - guides for other languages
are also available from this web site.
The code in the previous example performs these operations:
Connect to the database server started in the previous
section
Attach a database; notice that the database is treated like
an associative array
Get a collection (loosely equivalent to a table in a
relational database), treated like an associative array
Insert one or more entities
Query for one or more entites
Although mongoDB treats all these data as BSON internally,
most of the APIs allow the use of dictionary-style objects to
streamline the development process.
Object ID
A successful insertion into the database results in a valid
Object ID. This is the unique identier in the database for a
given document. When querying the database, a return value
will include this attribute:
{
"name" : "Tom",
"age" : 42,
"_id" : ObjectId('999999')
}
Users may override this Object ID with any argument as long as
its unique, or allow mongoDB to assign one automatically.
Common Use Cases
Caching - more robust capabilities, plus persistence, when
compared against a pure caching system
High volume processing - RDBMS may be too expensive
or slow to run in comparison
JSON data and program objects storage - many RESTful
web services provide JSON data; they can be stored in
mongoDB without language serialization overhead
(especially when compared against XML documents)
Content management systems - JSON/BSON objects can
represent any kind of document, including those with a
binary representation
mongoDB Drawbacks
No JOIN operations - each document is stand-alone
Complex queries - some complex queries and indices are
better suited for SQL
No row-level locking - unsuitable for transactional data
without error prone application-level assistance
If any of these is part of the functional requirements, a SQL
database would be better suited for the application.
GIGASPACES XAP
Application Frameworks
Jetty JEE Spring Mule
Groovy .Net C++ Java
XAP
Management
and
Monitoring
XAP Deployment Virtualization
XAP Middleware Virtualization
(Virtualized Clustering Layer)
RDBMS Memcache DB mongoDB
Figure 4 - GigaSpaces XAP Data Grid
The GigaSpaces eXtreme Application Platform is a data
grid designed to replace traditional application servers. It
operates based on an event-processing model where the
application dispatches objects to the processing nodes
associated with a given data partition. The system may be
congured so that data state on the grid may trigger events, or
an application may dispatch specic commands imperatively.
GigaSpaces XAP also manages all threading and execution
aspects of the operation, including thread and connection
pools. GigaSpaces XAP implements Spring transactions and
auto-recovery. The system detects any failed operations
in a computational node and automatically rolls back the
transaction; it then places it back in the space where another
node picks it up to complete processing.
GigaSpaces provides data persistence, distributed processing,
and caching by interfacing with SQL and NoSQL data stores.
The GigaSpaces API abstracts all back-end operations (job
dispatching, data persistence, and caching) and makes
DZone, Inc. | www.dzone.com
5
Getting Started with NoSQL and Data Scalability
it transparent to the application. GigaSpaces XAP may
implement distributed computing operations like MapReduce
and run them in its nodes, or it may dispatch them for
processing to the underlying subsystem if the functionality
is available (e.g. mongoDB, Hadoop). The application may
implement transactions using the Spring API, the GigaSpaces
XPA transactional facilities, or by implementing a workow
where specic NoSQL stores handle entities or groups of
entities. This must be implemented explicitly in systems like
mongoDB, but may exist in other NoSQL systems like Google
App Engines Datastore.
GigaSpaces XAP Programming
The GigaSpaces XAP API is very rich and covers many aspects
beyond NoSQL and data scalability areas like conguration
management, deployment, web services, third-party product
integration, etc.
The GigaSpaces XAP documentation is at:
http://www.gigaspaces.com/wiki/display/XAP71/Programmer%27s+Guide
NoSQL operations may be implemented over these APIs:
SQLQuery - allows querying a space using a SQL-like
syntax and regular expressions; do not confuse it with
JDBC support.
Persistency - mostly supports RDBMSs but may implement
other persistency mechanisms through the External Data
Source Components API.
memcached - support for key/value pair distributed
dictionaries available to any client in the grid; entities
are automatically made available across all nodes. The
memcached API is implemented on top of the data
grid, and its interchangeable with other memcached
implementations.
Task Execution - allows synchronous and asynchronous job
execution on specic nodes or clusters
The GigaSpaces XAP API is in a minority of stateful NoSQL
systems. Most NoSQL systems strive to achieve statelessness
to increase scalability and data consistency.
Common Use Cases
Real-time analytics - dynamic data analysis and reporting
based on data entered into a system less than a minute
before effective time of use
Map/reduce - distributed data processing of large data
sets across a computational grid
Near-zero downtime - allows for database schema
changes without homebrew master/slave congurations or
proprietary RDBMS dependencies
GigaSpaces XAP NoSQL Drawbacks
Complexity - the server, transactional, and grid model are
more complex than for other NoSQL systems
Application server model - the API and components are
geared toward building applications and transactional
logic
Steeper learning curve
Higher TCO - brings a requirement of a specialized,
well-trained system administration team with higher
requirements than other NoSQL systems
GOOGLE APP ENGINE DATASTORE
Bigtable
Master Server
(Logical table management, load balancing, garbage collection)
Tablet
Server 0
Tablet
Server 1
Tablet
Server n
Google File System
FS 0 FS 1 FS 2 FS n
Datastore
Java (JDO, JPA)
API 1
Other language
Datastore
Python
Your
Applications
Google
Applications
Figure 6 - Datastore Architecture
Datastore operations are dened around entities. Entities
can have one-to-many or many-to-many relationships. The
Datastore assigns unique IDs unless the application species
a unique key. Datastore also disallows some property names
that it uses for housekeeping. The complete Datastore
documentation is available from:
http://code.google.com/appengine/docs/python/datastore/
Hot
Tip
Did you notice the parallels between Datastore
and mongoDB so far? Many NoSQL database
implementations have solved similar problems in
similar ways.
Transactions and Entity Groups
Datastore supports transactions. These transactions are only
effective on entities that belong to the same entity group.
Entities in a given group are guaranteed to be stored on the
same server.
Datastore Programming
The programming model is based on inheritance of basic
entities, db.Model and db.Expando. Persistent data is
mapped onto an entity specialization of either of these classes.
The API provides persistence and querying instance methods
for every entity managed by the Datastore.
Programming Example
The Datastore API is simpler than other NoSQL APIs and is
highly optimized to work in the App Engine environment. In
this example we insert data into the data store, then run a query:
from google.appengine.ext import db
class Person(db.Model):
name = db.StringProperty(required=True)
age = db.IntegerProperty(require=True)
person = Person(name = 'Tom', age = 42)
person.put()
person = Person(name = 'Sue', age = 69)
person.put()
The Datastore is the main scalability feature of Google App
Engine applications. Its not a relational database or a faade
for one. Datastore is a public API for accessing Googles
Bigtable high-performance distributed database system.
Think of it as a sparse array distributed across multiple servers
that also allows an innite number of columns and rows.
Applications may even dene new columns on the y. The
Datastore scales by adding new servers to a cluster; Google
provides this functionality without user participation.


By Paul M. Duvall
ABOUT CONTINUOUS INTEGRATION
G
e
t M
o
r e
R
e
f c a
r d
z ! V
i s i t r e
f c a
r d
z . c o
m

#84
Continuous Integration:
Patterns and Anti-Patterns
CONTENTS INCLUDE:
About Continuous Integration
Build Software at Every Change
Patterns and Anti-patterns
Version Control
Build Management
Build Practices and more...
Continuous Integration (CI) is the process of building software
with every change committed to a projects version control
repository.
CI can be explained via patterns (i.e., a solution to a problem
in a particular context) and anti-patterns (i.e., ineffective
approaches sometimes used to x the particular problem)
associated with the process. Anti-patterns are solutions that
appear to be bene cial, but, in the end, they tend to produce
adverse effects. They are not necessarily bad practices, but can
produce unintended results when compared to implementing
the pattern.
Continuous Integration
While the conventional use of the term Continuous Integration
efers to the build and test cycle, this Refcard
expands on the notion of CI to include concepts such as

Aldon
Change. Collaborate. Comply.
Pattern
Description
Private Workspace
Develop software in a Private Workspace to isolate changes
Repository
Commit all les to a version-control repository
Mainline
Develop on a mainline to minimize merging and to manage
active code lines
Codeline Policy
Developing software within a system that utilizes multiple
codelines
Task-Level Commit
Organize source code changes by task-oriented units of work
and submit changes as a Task Level Commit
Label Build
Label the build with unique name
Automated Build
Automate all activities to build software from source without
manual con guration
Minimal Dependencies Reduce pre-installed tool dependencies to the bare minimum
Binary Integrity
For each tagged deployment, use the same deployment
package (e.g. WAR or EAR) in each target environment
Dependency Management Centralize all dependent libraries
Template Veri er
Create a single template le that all target environment
properties are based on
Staged Builds
Run remote builds into different target environments
Private Build
Perform a Private Build before committing changes to the
Repository
Integration Build
Perform an Integration Build periodically, continually, etc.
Send automated feedback from CI server to development team
ors as soon as they occur
Generate developer documentation with builds based on
brought to you by...


By Andy Harris
HTML BASICS
e
.co
m
G
e
t M
o
re
R
e
fca
rd
z! V
isit re
fca
rd
z.co
m

#64
C
o
re H
TM
L
HTML and XHTML are the foundation of all web development.
HTML is used as the graphical user interface in client-side
programs written in JavaScript. Server-side languages like PHP
and Java also receive data from web pages and use HTML
as the output mechanism. The emerging Ajax technologies
likewise use HTML and XHTML as their visual engine. HTML
was once a very loosely-de ned language with very little
standardization, but as it has become more important, the
need for standards has become more apparent. Regardless of
whether you choose to write HTML or XHTML, understanding
the current standards will help you provide a solid foundation
that will simplify all your other web coding. Fortunately HTML
and XHTML are actually simpler than they used to be, because
much of the functionality has moved to CSS.
common elements
Every page (HTML or XHTML shares certain elements in
common.) All are essentially plain text
extension. HTML les should not be cr
processor
CONTENTS INCLUDE:
HTML Basics HTML vs XHTML Validation Useful Open Source Tools
Page Structure Elements
Key Structural Elements and more...
The src attribute describes where the image le can be found,
and the alt attribute describes alternate text that is displayed if
the image is unavailable. Nested tags Tags can be (and frequently are) nested inside each other. Tags
cannot overlap, so <a><b></a></b> is not legal, but <a><b></
b></a> is ne.
HTML VS XHTML HTML has been around for some time. While it has done its
job admirably, that job has expanded far more than anybody
expected. Early HTML had very limited layout support.
Browser manufacturers added many competing standar
web developers came up with clever workar
result is a lack of standar
The latest web standar
Browse our collection of 100 Free Cheat Sheets
Upcoming Refcardz
Apache Ant
Hadoop
Spring Security
Subversion


By Daniel Rubio
ABOUT CLOUD COMPUTING
C
lo
u
d
C
o
m
p
u
tin
g
w
w
w
.d
z
o
n
e
.co
m
G
e
t M
o
re
R
e
fc
a
rd
z
! V
isit re
fc
a
rd
z.c
o
m

#82
Getting Started with
Cloud Computing
CONTENTS INCLUDE:
About Cloud Computing
Usage Scenarios
Underlying Concepts
Cost
Data Tier Technologies
Platform Management and more...
Web applications have always been deployed on servers
connected to what is now deemed the cloud.
However, the demands and technology used on such servers
has changed substantially in recent years, especially with
the entrance of service providers like Amazon, Google and
Microsoft.
These companies have long deployed web applications
that adapt and scale to large user bases, making them
knowledgeable in many aspects related to cloud computing.
This Refcard will introduce to you to cloud computing, with an
emphasis on these providers, so you can better understand
what it is a cloud computing platform can offer your web
applications.
USAGE SCENARIOS
Pay only what you consume
Web application deployment until a few years ago was similar
to most phone services: plans with alloted resources, with an
incurred cost whether such resources were consumed or not.
Cloud computing as its known today has changed this.
The various resources consumed by web applications (e.g.
bandwidth, memory, CPU) are tallied on a per-unit basis
(starting from zero) by all major cloud computing platforms.
also minimizes the need to make design changes to support
one time events.
Automated growth & scalable technologies
Having the capability to support one time events, cloud
computing platforms also facilitate the gradual growth curves
faced by web applications.
Large scale growth scenarios involving specialized equipment
(e.g. load balancers and clusters) are all but abstracted away by
relying on a cloud computing platforms technology.
In addition, several cloud computing platforms support data
tier technologies that exceed the precedent set by Relational
Database Systems (RDBMS): Map Reduce, web service APIs,
etc. Some platforms support large scale RDBMS deployments.
CLOUD COMPUTING PLATFORMS AND
UNDERLYING CONCEPTS
Amazon EC2: Industry standard software and virtualization
Amazons cloud computing platform is heavily based on
industry standard software and virtualization technology.
Virtualization allows a physical piece of hardware to be
utilized by multiple operating systems. This allows resources
(e.g. bandwidth, memory, CPU) to be allocated exclusively to
individual operating system instances.
As a user of Amazons EC2 cloud computing platform, you are
assigned an operating system in the same way as on all hosting

DZone, Inc.
140 Preston Executive Dr.
Suite 100
Cary, NC 27513
888.678.0399
919.678.0300
Refcardz Feedback Welcome
[email protected]
Sponsorship Opportunities
[email protected]
Copyright 2010 DZone, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical,
photocopying, or otherwise, without prior written permission of the publisher.
Version 1.0
$
7
.
9
5
DZone communities deliver over 6 million pages each month to
more than 3.3 million software developers, architects and decision
makers. DZone offers something for everyone, including news,
tutorials, cheatsheets, blogs, feature articles, source code and more.
DZone is a developers dream, says PC Magazine.
6
Getting Started with NoSQL and Data Scalability
RECOMMENDED BOOK ABOUT THE AUTHOR
ISBN-13: 978-1-934238-75-2
ISBN-10: 1-934238-75-9
9 781934 238752
50795
# fnd a specifc person
query = Person.all() # every entity!
query.flter('age > ', 20)
query.order('name')
peopleList = query.fetch() # up to 1000
for person in peopleList:
print person.name
This example performs these operations:
Defnes a Person kind and associates it with the Datastore
Persists new items using the put() method
Defnes and executes a query; notice that the query
conditions are expressed as strings
Gql - the Google Query Language
Datastore also allows queries in a custom, SQL-like language.
The query in the previous example could be expressed as:
SELECT * FROM Person WHERE age > 20
ORDER BY name ASC
Gql is useful for writing more expressive queries than those
written in the Python or Java APIs.
Hot
Tip
Careful! Python vs. Gql queries have different
performance and quota characteristics that may
impact your cost or functionality! Refer to the
Datastore documentation for a discussion of how
they differ.
Common Use Cases
Massive scalability - Datastore offers ultimate scalability
by leveraging Googles own infrastructure for persistent
storage
Google App Engine applications - there is no alternative
mechanism for data storage on this platform
Data rich RESTful Web services - use the Datastore
and App Engine infrastructure to ofoad traditional data
centers when stateless, data-intensive web services must
be implemented
Datastore Drawbacks
Vendor lock-in - persistence and queries are tightly
coupled with the Datastore and the Datastore API is far
from being an industry standard
Availability - Datastore has been known to fail and the
EULA doesnt allow more than 4-nines SLAs
Quotas - Datastore utilization costs per data access and
for processor time
Query limits - Result sets are limited to return a maximum
of 1,000 entities, forcing queries be needlessly complex
STAYING CURRENT
Do you want to know about specic projects and use cases
where NoSQL and data scalability are the hot topics? Join the
scalability newsletter:
http://eugeneciurana.com/scalablesystems
Eugene Ciurana is an open-source evangelist who specializes in the design
and implementation of mission-critical, high-availability large scale systems.
Over the last two years, Eugene designed and built hybrid cloud scalable
systems for leading nancial, software, insurance, and healthcare companies
in the US, Japan, and Europe. As chief liaison between Walmart.com Global
and the ISD Technology Council, he led the offcial adoption of Linux and
other open-source technologies at Walmart Stores Information Systems
Division in 2006.
Publications
Developing with Google App Engine
DZone Refcard #43: Scalability and High Availability
DZone Refcard #38: SOA Patterns
The Tesla Testament: A Thriller
MongoDB, a cross-platform NoSQL database, is the fastest-
growing new database in the world. MongoDB provides a
rich document orientated structure with dynamic queries that
youll recognize from RDMBS offerings such as MySQL. In other
words, this is a book about a NoSQL database that does not
require the SQL crowd to re-learn how the database world
works!
BUY NOW
books.dzone.com/books/mongodb

You might also like