0% found this document useful (0 votes)
197 views

Mastering Google App Engine - Sample Chapter

Chapter No. 1 Understanding the Runtime Environment Build robust and highly scalable web applications with Google App Engine For more information: http://bit.ly/1NmhlOn

Uploaded by

Packt Publishing
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
197 views

Mastering Google App Engine - Sample Chapter

Chapter No. 1 Understanding the Runtime Environment Build robust and highly scalable web applications with Google App Engine For more information: http://bit.ly/1NmhlOn

Uploaded by

Packt Publishing
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Fr

Google App Engine allows you to develop highly scalable


web applications or backends for mobile applications
without worrying about system administration plumbing or
hardware provisioning issues. Just focus on writing your
business logic, the meat of the application, and let Google's
powerful infrastructure scale it to thousands of requests
per second and millions of users without any effort on
your part.
Starting with a walkthrough of what scalability is and how
scalable web applications work, this book introduces you
to the environment under which your applications operate
on Google App Engine. Next, you will learn about Google's
datastore which is a massively scalable distributed NoSQL
solution built on top of BigTable.
After that, we will show you how to implement powerful
search functionality backed by datastore. Finally, you
will be presented with the deployment and monitoring of
your applications in production along with a detailed look
at dividing applications into different working modules.
You'll also learn how to execute long running tasks in
background using queues.

Who this book is written for

Develop and scale your applications on top


of Google App Engine's runtime environment
Get firm grip of the Google App Engine's
request handling mechanism and write
request handlers
Dive deep into Google's distributed NoSQL,
highly scalable datastore and design
your application around it
Implement powerful search functionality
backed with a scalable datastore
Perform long-running tasks in the
background using task queues
Write compartmentalized apps using
multi tenancy, memcache, and other
Google App Engine runtime services
Deploy, tweak, and manage apps in
production on Google App Engine

$ 49.99 US
31.99 UK

community experience distilled

P U B L I S H I N G

E x p e r i e n c e

D i s t i l l e d

Mastering Google
App Engine
Build robust and highly scalable web applications with Google
App Engine

Prices do not include


local sales tax or VAT
where applicable

Visit www.PacktPub.com for books, eBooks,


code, downloads, and PacktLib.

Sa
m

pl

C o m m u n i t y

Mohsin Shafique Hijazee

If you have been developing web applications in Python


or any other programming language such as PHP, Ruby or
Java but have always wondered how to write highly scalable
web applications without getting into system administration
and other plumbing, then this is the book for you.

What you will learn from this book

Mastering Google App Engine

Mastering Google
App Engine

ee

Mohsin Shafique Hijazee

In this package, you will find:

The author biography


A preview chapter from the book, Chapter 1 'Understanding the Runtime
Environment'
A synopsis of the books content
More information on Mastering Google App Engine

About the Author


Mohsin Shafique Hijazee started his programming adventure by teaching himself

C, and later C++, mostly with the Win 32 API and MFC. Later, he worked with Visual
Basic to develop an invoicing application for local distributors. In the meantime, .NET
came along and Mohsin happened to be working with C# and Windows Forms. All
of this was around desktop applications, and all of this happened during his days
at university.

Very few people have had a chance to work with fonts, and that's exactly what Mohsin
happened to do as his first jobdeveloping OpenType fonts for complex right to left
calligraphic styles such as Nastaleeq. He developed two different fonts, one based on
characters and joining rules, and the other one contained more than 18,000 ligatures
both of which are in public domain.
His first serious interaction with web development started with Ruby on Rails.
Shortly after that, he discovered Google App Engine and found it to be a very
interesting platform despite its initial limitations back in 2008, with Python being
the only available runtime environment. Mohsin kept experimenting with the
platform and deployed many production applications and mobile backends
that are hosted on Google App Engine to this day.
Currently, Mohsin is working as a backend software engineer with a large
multinational Internet company that operates in the online classified space
in dozens of countries across the globe.

Preface
Google App Engine is a Platform as a Service that builds and runs applications on
Google's infrastructure. App Engine applications are easy to build, maintain,
and scale.
Google App Engine allows you to develop highly scalable web applications
or backends for mobile applications without worrying about the system
administration's plumbing or hardware provisioning issues. You can just focus on
writing your business logic, which is the meat of the application, and let Google's
powerful infrastructure scale it to thousands of requests per second and millions of
users without any effort on your part.
This book introduces you to cloud computing, managed Platform as a Service,
the things that Google has to offer, and the advantages. It also introduces you to a
sample app that will be built during the course of the book. It will be a small invoice
management application where we have clients, products, categories, invoices, and
payments as a sample SaaS application. The most complex part is that of reporting,
as datastore has certain limitations on this.

What this book covers


Chapter 1, Understanding the Runtime Environment, explains the runtime environment,
how requests are processed and handled, and how App Engine scales. This chapter
also explores the limitations of runtime environments with respect to the request
time and response size, among other factors.
Chapter 2, Handling Web Requests, introduces ways to handle web requests by using
a built-in framework or Django and others. It also discusses how to serve static files
and caching issues, render templates.

Preface

Chapter 3, Understanding the Datastore, covers the problem of storing huge amounts
of data and processing it in bulk with the ability to randomly access it. This chapter
explains the datastore in detail, which is built on top of Bigtable.
Chapter 4, Modeling Your Data, explains the new ndb Python library on top of
Google datastore. It will also teach you how to model your data using its API.
Chapter 5, Queries, Indexes, and Transactions, focuses on how to query your data,
the limitations, and ways to work around these limitations.
Chapter 6, Integrating Search, builds upon the datastore and shows how to make
data searchable.
Chapter 7, Using Task Queues, introduces the reader to task queues, which enable
the background repeated execution of tasks.
Chapter 8, Reaching out, Sending E-mails, talks about how the app can send and
receive e-mails and how to handle bounce notifications.
Chapter 9, Working with the Google App Engine Services, introduces you to the other
services that are provided by Google App Engine to make you aware of your
available options.
Chapter 10, Application Deployment, talks in detail about deploying the GAE apps.

Understanding the Runtime


Environment
In this chapter, we will look at the runtime environment that is offered by Google
App Engine. Overall, a few details of the runtime environment pertaining to the
infrastructure remain the same no matter which runtime environmentJava,
Python, Go, or PHPyou opt for.
From all the available runtimes, Python is the most mature one. Therefore, in order
to master Google App Engine, we will focus on Python alone. Many of the details
vary a bit, but in general, runtimes have a commonality. Having said that, the other
runtimes are catching up as well and all of them (including Java, PHP, and Go) are
out of their respective beta stages.
Understanding the runtime environment will help you have a better grasp of the
environment in which your code executes and you might be able to tweak code in
accordance and understand why things behave the way they behave.
In this chapter, we will cover the following topics:

The overall architecture

Runtime environments

Anatomy of a Google App Engine application

A quick overview of the available services

Setting up the development tools and writing a basic application

[1]

Understanding the Runtime Environment

The overall architecture


The scaling of a web application is a hard thing to do. Serving a single page to a
single user is a simple matter. Serving thousands of pages to a single or a handful
of users is a simple matter, too. However, delivering just a single page to tens of
thousands of users is a complex task. To better understand how Google App Engine
deals with the problem of scale, we will revisit the whole problem of scaling in next
chapter's, how it has been solved till date and the technologies/techniques that are
at work behind the scenes. Once armed with this understanding, we will talk about
how Google App Engine actually works.

The challenge of scale


The whole problem of complexity arises from the fact that to serve a simple page, a
certain amount of time is taken by the machine that hosts the page. This time usually
falls in milliseconds, and eventually, there's a limit to the number of pages that can
be rendered and served in a second. For instance, if it takes 10 milliseconds to render
a page on a 1 GHz machine, this means that in one second, we can serve 100 pages,
which means that at a time, roughly 100 users can be served in a second.
However, if there are 300 users per second, we're out of luck as we will only be able
to serve the first 100 lucky users. The rest will get time-out errors, and they may
perceive that our web page is not responding, as a rotating wait icon will appear
on the browser, which will indicate that the page is loading.
Let's introduce a term here. Instead of pages per second, we will call it requests or
queries per second, or simply Queries Per Second (QPS), because users pointing the
browser to our page is just a request for the page.

How to scale with the scale?


We have two options here. The first option is to bring the rendering time down
from 10 milliseconds to 5 milliseconds, which will effectively help us serve double
the number of users. This path is called optimization. It has many techniques,
which involve minimizing disk reads, caching computations instead of doing on
the fly, and all that varies from application to application. Once you've applied all
possible optimizations and achieved a newer and better page rendering time, further
reduction won't be possible, because there's always a limit to how much we can
optimize things and there always will be some overhead. Nothing comes for free.

[2]

Chapter 1

The other way of scaling things up will be to put more hardware. So, instead of a
1 GHz machine, we can put a 2 GHz machine. Thus, we effectively doubled the
number of requests that are processed from 100 to 200 QPS. So now, we can serve
200 users in a second. This method of scaling is called vertical scaling. However, yet
again, vertical scaling has its limits because you can put a 3 GHz processor, then a
3.5 GHz one, or maybe clock it to a 4.8 GHz one, but finally, the clock frequency has
some physical limits that are imposed by how the universe is constructed, and we'll
hit the wall sooner or later. The other way around is that instead of putting a single
1 GHz machine, we can put two such machines and a third one in front. Now, when
a request comes to the third front-end machine, we can distribute it to either of the
other two machines in an alternate fashion, or to the machine with the least load.
This request distribution can have many strategies. It can be as simple as a random
selection between the two machines, or round-robin fashion one after the other or
delegating request to the least loaded machine or we may even factor in the past
response times of the machines. The main idea and beauty of the whole scheme is
that we are no more limited by the limitations of the hardware. If a 1 GHz machine
serves 100 users, we can put 10 such machines to serve 1000 users. To serve an
audience of 1 million users, we will need ten thousand machines. This is exactly how
Google, Facebook, Twitter, and Amazon handle tens of millions of users. The image
shows the process of load balancer:

Internet
Load Balancer
Web Servers

Load balancer splitting the load among machines.

A critical and enabling component here is the machine at front called load
balancer. This machine runs the software that receives requests and delegates
them to the other machines. Many web servers such as Ngnix and Apache come
with load-balancing capabilities and require configurations for activating load
balancing. The HAProxy is another open source load balancer that has many
algorithms at its disposal, which are used to distribute load among the
available servers.

[3]

Understanding the Runtime Environment

A very important aspect of this scaling magic is that each machine, when added
to the network, must respond in a manner that is consistent with the responses
of the other machines of the cluster. Otherwise, users will have an inconsistent
experience, that is, they might see something different when routed to one machine
and something else when routed to another machine. For this to happen, even if
the operating system differs (consider an instance where the first machine runs on
Ubuntu with Cpython and the second one runs on CentOS with Jython), the output
produced by each node should be exactly the same. In order to keep things simple,
each machine usually has an exactly identical OS, set of libraries, and configurations.

Scaling in practice
Now that you have a load balancer and two servers and you're able to ramp up
about 200 QPS (200 users per second), what happens when your user base grows
to about 500 people? Well, it's simple. You have to repeat the following process:
1. Go to a store and purchase three more machines.
2. Put them on racks and plug in the network and power cables.
3. Install an OS on them.
4. Install the required languages/runtimes such as Ruby or Python.
5. Install libraries and frameworks, such as Rails or Django.
6. Install components such as web servers and databases.
7. Configure all of software.
8. Finally, add the address of the new machines to the load balancer
configuration so that it can start delegating requests from users to
machines as well.
You have to repeat the same process for all the three machines that you purchased
from the store.
So, in this way, we scaled up our application, but how much time did it take us to do
that all? The setting up of the server cables took about 10 minutes, the OS installation
another 15 minutes, and the installation of the software components consumed about
40 minutes. So approximately, it took about 1 hour and 5 minutes to add a single
node to the machine. Add the three nodes yourself, this amounts to about 4 hours
and 15 minutes, that too if you're efficient enough and don't make a mistake along
the way, which may make you go back and trace what went wrong and redo the
things. Moreover, the sudden spike of users may be long gone by then, as they may
feel frustrated by a slow or an unresponsive website. This may leave your newly
installed machines idle.
[4]

Chapter 1

Infrastructure as a Service
This clunky game of scaling was disrupted by another technology called
virtualization, which lets us emulate a virtual machine on top of an operating
system. Now that you have a virtual machine, you can install another operating
system on this virtual machine. You can have more than one virtual machine on a
single physical machine if your hardware is powerful enough, which usually is the
case with server-grade machines. So now, instead of wiring a physical machine
and installing the required OS, libraries, and so on, you can simply spin a virtual
machine from a binary image that contains an OS and all the required libraries, tools,
software components, and even your application code, if you want. Spinning such
a machine requires few minutes (usually about 40 to 150 seconds). So, this is a great
time-saving technique, as it cuts down the time requirement from one and a half
hour to a few minutes.
Virtualization has created a multibillion-dollar industry. It is a whole new cool
term that is related to Cloud computing for consultants of all sorts, and it is used
to furnish their resumes. The idea is to put hundreds of servers on racks with
virtualization enabled, let the users spin the virtual machines of their desired specs
and charge them based on the usage. This is called Infrastructure as a Service (IaaS).
Amazon, Racksapce, and Digital Ocean are the prime examples of such models.

Platform as a Service
Although Infrastructure as a Service gives a huge boost in building scalable
applications, it still leaves a lot of room for improvements because you have to take
care of the OS, required libraries, tools, security updates, the load balancing and
provisioning of new machine instances, and almost everything in between. This
limitation or problem leads to another solution called Platform as a Service (Paas),
where right from the operating system to the required runtime, libraries and tools
are preinstalled and configured for you. All that you have to do is push your code,
and it will start serving right away. Google App Engine is such a platform where
everything else is taken care of and all that you have to worry about is your code
and what your app is supposed to do.
However, there's another major difference between IaaS and PaaS. Let's see what the
difference is.

[5]

Understanding the Runtime Environment

Containers
We talked about scaling by adding new machines to our hosting fleet that was done
by putting up new machines on the rack, plugging in the wires, and installing the
required software, which was tedious and very time-consuming and took up hours.
We then spoke about how virtualization changed the game. You can instantiate a
whole new (virtual) machine in a few minutes, possibly from an existing disk image,
so that you don't have to install anything. This is indeed a real game changer.
However, the machine is slow at the Internet scale. You may have a sudden increase
in the traffic and you might not be able to afford waiting for a few minutes to boot
new instances. There's a faster way that comes from a few special features in the
Linux kernel, where each executing process can have its own allocated and dedicated
resources. What this abstract term means is that each process gets its own partition
of the file systems, CPU, and memory share. This process is completely isolated
from the other processes. Hence, it is executed in an isolated container. Then, for
all practical purposes, this containment actually works as a virtual machine. An
overhead of creating such an environment merely requires spinning a new process,
which is not a matter of minutes but of a few seconds.
Google App Engine uses containment technology instead of virtualization
to scale up the things. Hence, it is able to respond much faster than any IaaS
solution, where they have to load a whole new virtual machine and then the
whole separate operating system on top of an existing operating system along
with the required libraries.
The containers use a totally different approach towards virtualization. Instead of
emulating the whole hardware layer and then running an operating system on top
of it, they actually are able to provide each running process a totally different view
of the system in terms of file system, memory, network, and CPU. This is mainly
enabled by cgroups (short for control groups). A kernel feature was developed by
the engineers at Google in 2006 and later, it was merged into Linux kernel 2.6.24,
which allows us to define an isolated environment and perform resource accounting
for processes.
A container is just a separation of resources, such as file system, memory, and other
resources. This is somewhat similar to chroot on Linux/Unix systems which changes
the apparent root directory for the current running process and all of its parent-child.
If you're familiar with it, you can change the system that you're working on, or simply
put, you can replace the hard drive of your laptop with a hard drive from another
laptop with identical hardware but a different operating system and set of programs.
Hence, the mechanism helps to run totally different applications in each container. So,
one container might be running LAMP stack and another might be running node.js on
the same machine that runs at bare metal at native speed with no overhead.
[6]

Chapter 1

This is called operating system virtualization and it's a vast subject in itself. Much
more has been built on top of cgroups, such as Linux Containers (LXC) and Docker
on top of LXC or using libvirt, but recently, docker has its own library called
libcontainer, which sits directly on top of cgroups. However, the key idea is
process containment, which results in a major reduction of time. Eventually,
you will be able to spin a new virtual machine in a few seconds, as it is just about
launching another ordinary Linux process, although contained in terms of what
and how it sees the underlying system.
A comparison of virtual machines versus application containers (App Engine
instances in our case) can be seen in the following diagram:

VM

User
Code
A

User
Code
B

User
Code
C

Libs
Bins

Libs
Bins

Libs
Bins

Guest
OS

Guest
OS

Guest
OS

App Engine
Instance

User
Code
A

User
Code
B

User
Code
C

Libs

Libs

Libs

Hypervisor

App Engine Runtime


Service Libs & APIs

Host OS

Operating System

Server hardware

Server hardware

Virtualization vs container based App Engine machine instances.

How does App Engine scales?


Now that we understand many of the basic concepts behind how web applications
can be scaled and the technologies that are at work, we can now examine how App
Engine scales itself. When a user navigates to your app using their browser, the
first thing that receives the users are the Google front end servers. These servers
determine whether it is a request for App Engine (mainly by examining the HTTP
Host header), and if it is, they are handed over to the App Engine server.

[7]

Understanding the Runtime Environment

The App Engine server first determines whether this is a request for a static resource,
and if that's the case, it is handed over to the static file servers, and the whole process
ends here. Your application code never gets executed if a static resource is requested
such as a JavaScript file or a CSS stylesheet. The following image shows the cycle of
Google App Engine server request process:

Closest Google
Data Center

App Engine Data Center


App Master
(App Engine Management Layer)

Edge Cache

Static Servers

Application
Instances

Google's
Fiber

ISP

App Engine
Front End

Google
Front End

App Servers

User

Application
Instances
Application
Instances

Google App Engine Journey of a request.

However, in case the request is dynamic, the App Engine server assigns it a unique
identifier based on the time of receiving it. It is entered into a request queue, where
it shall wait till an instance is available to serve it, as waiting might be cheaper then
spinning a new instance altogether. As we talked about in the section on containers,
these instances are actually containers and just isolated processes. So eventually,
it is not as costly as launching a new virtual machine altogether. There are a few
parameters here that you can tweak, which are accessible from the application
performance settings once you've deployed. One is the minimum latency. It is the
minimum amount of time a request should wait in the queue If you set this value to
a higher number, you'll be able to serve more requests with fewer instances but at
the cost of more latency, as perceived by the end user. App Engine will wait till the
time that is specified as minimum latency and then, it will hand over the request to
an existing instance. The other parameter is maximum latency, which specifies the
maximum time for which a request can be held in the request queue, after which,
App Engine will spin a new instance if none is available and pass the request to it.
If this value is too low, App Engine will spin more instances, which will result in an
increase in cost but much less latency, as experienced by the end user.
[8]

Chapter 1

However by default, if you haven't tweaked the default settings. (we'll see how to do
this in the Chapter 10, Application Deployment) Google App Engine will use heuristics
to determine whether it should spin a new instance based on your past request
history and patterns.

App Engine Front End

Pending
Latency

Instance

If existing instances are busy,


and the Pending Latency for a
request is large, then create a
new Instance to handle the load.

Instance

Instance

Busy

Instance

New

App Engine: Request, Request queues and Instances.

The last but a very important component in the whole scheme of things is the App
Engine master. This is responsible for updates, deployments, and the versioning of
the app. This is the component that pushes static resources to static servers and code
to application instances when you deploy an application to App Engine.

Available runtimes
You can write web applications on top of Google App Engine in many programming
languages, and your choices include Python, Java, Go, and PHP. For Python, two
versions of runtimes are available, we will focus on the latest version.
Let's briefly look at each of the environments.

[9]

Understanding the Runtime Environment

Python
The most basic and important principle of all runtime environments, including that
of Python, is that you can talk to the outside world only by going through Google's
own services. It is like a completely sealed and contained sandbox where you are
not allowed to write to the disk or to connect to the network. However, no program
will be very useful in that kind of isolation. Therefore, you can definitely talk to the
outside world but only through the services provided by the App Engine. You can
also ship your own code and libraries but they must all be in pure Python code and
no C extensions are allowed. This is actually a limitation and tradeoff to ensure
that the containers are always identical. Since no external libraries are allowed,
it can be ensured that the minimal set of native required libraries is always
present on the instance.
At the very beginning, App Engine started with the Python runtime environment,
and version 2.5 was the one that was available for you. It had a few external libraries
too, and it provided a CGI environment for your web app to talk to the world. That
is, when a web request comes in, the environment variables are set from the request,
the body goes to stdin and the Python interpreter invoked with given program.
It is up to your program to then handle and respond to the request. This runtime
environment is now deprecated.
Later, the Python 2.7 runtime environment came along, with new language features
and updated shipped libraries. A major departure from the Python 2.5 runtime
environment was not only the language version, but also a switch from CGI to
WSGI. Because of this switch, it became possible for web apps to process requests
concurrently. This boosted the overall throughput per instance. We will examine CGI
and WSGI in detail in the next chapter.

The Java runtime environment


Java runtime environment presents a standard Servlet version 2.5 environment,
and there are two language versions availableJava 5 and Java 6. The Java 6
runtime environment is deprecated and will be soon removed. The Java 6 runtime
environment will be replaced and new applications users can only be able to use Java
7. The app.xml is a file that defines your application, and you have various standard
Java APIs available to talk to Google services, such as JPA for persistence, Java Mail
for mail, and so on.
This runtime environment is also capable of handling concurrent requests.

[ 10 ]

Chapter 1

Go
This runtime environment uses the new Go programming language from Google.
It is a CGI environment too, and it's not possible to handle concurrent requests,
the applications are written in Go version 1.4.

PHP
This is a preview platform, and the PHP interpreter is modified to fit in the scalable
environment with the libraries patched, removed, or the individual functions
disabled. You get to develop applications just as you would do for any normal
PHP web application, but there are many limitations. Many of the standard library
modules are either not available, or are partially functional, the applications are
written in PHP version 5.5.

The structure of an application


When you are developing a web application that has to be hosted on Google App
Engine, it has to have a certain structure so that the platform can deploy it. A
minimal App Engine application is composed of an application manifest file called
app.yaml and at least one script / code file that handles and responds to requests.
The app.yaml file defines the application ID, version of the application, required
runtime environment and libraries, static resources, if any, and the set of URLs along
with their mappings to the actual code files that are responsible for their processing.
So eventually, if you look at the minimum application structure, it will comprise only
the following two files:

app.yaml

main.py

Here, app.yaml describes the application and set of URLs to the actual code files
mappings. We will examine app.yaml in greater detail in a later section. The app.
yaml is not the only file that makes up your application. There are a few other
optional configuration files as well. In case you are using datastore, there may be
another file called index.yaml, which lists the kind of indexes that your app will
require. Although you can edit this file, it is automatically generated for you,
as your application runs queries locally.

[ 11 ]

Understanding the Runtime Environment

You then might have a crons.yaml file as well, that describes various repeated tasks.
The queus.yaml file descries your queue configurations so that you can queue in
long running tasks for later processing. The dos.yaml is the file that your application
might define to prevent DoS attacks.
However, most importantly, your application can have one or more logical modules,
where each module will run on a separate instance and might have different scaling
characteristics. So, you can have a module defined by api.yaml that handles your
API calls, and its scaling type is set to automatic so that it responds to requests
according to the number of consumers. Another named backend.yaml handles
various long running tasks, and its scaling type is set to manual with 5 instances on
standby, which will keep running all the time to handle whatever the long running
tasks handled to them.
We will take a look at modules later in this book when discussing deployment
options in Chapter 10, Application Deployment.

The available services


By now, you probably understand the overall architecture and atmosphere in which
our app executes, but it won't be of much use without more services available at our
disposal. Otherwise, with the limitation of pure Python code, we might have to bring
everything that is required along with us to build the next killer web app.
To this end, Google App Engine provides many useful scalable services that you
can utilize to build app. Some services address storage needs, others address the
processing needs of an app, and yet, the other group caters to the communication
needs. In a nutshell, the following services are at your disposal:

Storage: Datastore, Blobstore, Cloud SQL, and Memcache

Processing: Images, Crons, Tasks, and MapReduce

Communication: Mail, XMPP, and Channels

Identity and security: Users, OAuth, and App Identity

Others: such as various capabilities, image processing and full text search

If the list seems short, Google constantly keeps adding new services all the time.
Now, let's look at each of the previously listed services in detail.

[ 12 ]

Chapter 1

Datastore
Datastore is a NoSQL, distributed, and highly scalable column based on a storage
solution that can scale to petabytes of data so that you don't have to worry about
scaling at all. App Engine provides a data modeling library that you can use to
model your data, just as you would with any Object Relational Mapping (ORM),
such as the Django models or SQL Alchemy. The syntax is quite similar, but there
are differences.
Each object that you save gets a unique key, which is a long string of bytes. Its
generation is another topic that we will discuss later. Since it's a NoSQL solution,
there are certain limitations on what you can query, which makes it unfit for
everyday use, but we can work around those limitations, as we will explore
in the coming chapters.
By default, apps get 1 GB of free space in datastore. So, you can start experimenting
with it right away.

Google Cloud SQL


If you prefer using a relational database, you can have that too. It is a standard
MySQL database, and you have to boot up instances and connect with it via
whatever interface is available to your runtime environment, such as JDBC in case of
Java and MySQLdb in case of Python. Datastore comes with a free quota of about
1 GB of data, but for Cloud SQL, you have to pay from the start.
Because dealing with MySQL is a topic that has been explored in much detail from
blog posts to articles and entire books have been written on the subject, this book
skips the details on this, it focuses more on Google Datastore.

The Blobstore
Your application might want to store larger chunks of data such as images, audio,
and video files. The Blobstore just does that for you. You are given a URL, which has
to be used as the target of the upload form. Uploads are handled for you, while a
key of the uploaded file is returned to a specified callback URL, which can be stored
for later reference. For letting users download a file, you can simply set the key that
you got from the upload as a specific header on your response, which is taken as an
indication by the App Engine to send the file contents to the user.

[ 13 ]

Understanding the Runtime Environment

Memcache
Hitting datastore for every request costs time and computational resources. The
same goes for the rendering of templates with a given set of values. Time is money.
Time really is money when it comes to cloud, as you pay in terms of the time your
code spends in satisfying user requests. This can be reduced by caching certain
content or queries that occur over and over for the same set of data. Google App
Engine provides you with memcache to play with so that you can supercharge
your app response.
When using App Engine's Python library to model data and query, the caching of the
data that is fetched from datastore is automatically done for you, which was not the
case in the previous versions of the library.

Scheduled Tasks
You might want to perform some certain tasks at certain intervals. That's where the
scheduled tasks fit in. Conceptually, they are similar to the Linux/UNIX Cron jobs.
However, instead of specifying commands or programs, you indicate URLs, which
receive the HTTP GET requests from App Engine on the specified intervals. You're
required to process your stuff in under 10 minutes. However, if you want to run
longer tasks, you have that option too by tweaking the scaling options, which
will be examined in the last chapter when we examine deployment.

Queues Tasks
Besides the scheduled tasks, you might be interested in the background processing
of tasks. For this, Google App Engine allows you to create tasks queues and enqueue
tasks in them specifying a target URL with payload, where they are dispatched
on a specified and configurable rate. Hence, it is possible to asynchronously
perform various computations and other pieces of work that otherwise cannot
be accommodated in request handlers.
App Engine provides two types of queuespush queues and pull queues. In push
queues, the tasks are delivered to your code via the URL dispatch mechanism,
and the only limitation is that you must execute them within the App Engine
environment. On the other hand, you can have pull requests where it's your
responsibility to pull tasks and delete them once you are done. To that end, pull
tasks can be accessed and processed from outside Google App Engine. Each task is
retried with backoffs if it fails, and you can configure the rate at which the tasks get
processed and configure this for each of the task queues or even at the individual
task level itself. The task retries are only available for push queues and for pull
queues, you will have to manage repeated attempts of failed tasks on your own.
[ 14 ]

Chapter 1

Each app has a default task queue, and it lets you create additional queues, which
are defined in the queues.yaml file. Just like the scheduled tasks, each task is
supposed to finish its processing within 10 minutes. However, if it takes longer then
this, we'll learn how to accommodate such a situation when we examine application
deployment in the last chapter.

MapReduce
MapReduce is a distributed computing paradigm that is widely used at Google to
crunch exotic amounts of data, and now, many open source implementations of such
a model exist, such as Hadoop. App Engine provides the MapReduce functionality
as well, but at the time of writing this book, Google has moved the development
and support of MapReduce libraries for Python and Java to Open source community
and they are hosted on Github. Eventually, these features are bound to change a lot.
Therefore, we'll not cover MapReduce in this book but if you want to explore this
topic further, check https://github.com/GoogleCloudPlatform/appenginemapreduce/wiki for further details.

Mail
Google is in the mail business. So, your applications can send mails. You can
not only send e-mails, but also receive them as well. If you plan to write your app
in Java, you will use JavaMail as the API to send emails. You can of course use
third-party solutions as well to send email, such as SendGrid, which integrates
nicely with Google App Engine. If you're interested in this kind of solution, visit
https://cloud.google.com/appengine/docs/python/mail/sendgrid.

XMPP
It's all about instant messaging. You may want to build chat features in your app
or use in other innovative ways, such as notifying users about a purchase as an
instant message or anything else whereas for that matter. XMPP services are at your
disposal. You can send a message to a user, whereas your app will receive messages
from users in the form of HTTP POST requests of a specific URL. You can respond to
them in whatever way you see fit.

[ 15 ]

Understanding the Runtime Environment

Channels
You might want to build something that does not work with the communication
model of XMPP, and for this, you have channels at your disposal. This allows you
to create a persistent connection from one client to the other clients via Google App
Engine. You can supply a client ID to App Engine, and a channel is opened for you.
Any client can listen on this channel, and when you send a message to this channel,
it gets pushed to all the clients. This can be useful, for instance, if you wish to inform
about the real-time activity of other users, which is similar to you notice on Google
Docs when editing a spreadsheet or document together.

Users
Authentication is an important part of any web application. App Engine allows
you to generate URLs that redirect users to enter their Google account credentials
([email protected]) and manage sessions for you. You also have the option
of restricting the sign-in functionality for a specific domain (such as yourname@
yourcompany.com) in case your company uses Google Apps for business and you
intend to build some internal solutions. You can limit access to the users on your
domain alone.

OAuth
Did you ever come across a button labeled Sign in with Facebook, Twitter, Google,
and LinkedIn on various websites? Your app can have similar capabilities as well,
where you let users not only use the credentials that they registered with on your
website, but also sign in to others. In technical jargon, Google Engine can be an
OAuth provider.

Writing and deploying a simple


application
Now that you understand how App Engine works and the composition of an
App Engine app, it's time to get our hands on some real code and play with it.
We will use Python to develop applications, and we've got a few reasons to do so.
For one, Python is a very simple and an easy-to-grasp language. No matter what
your background is, you will be up and running it quickly. Further, Python is the
most mature and accessible runtime environment because it is available since the
introduction of App Engine, Further almost all new experimental and cutting-edge
services are first introduced for Python runtime environment before they make their
way to other runtimes.
[ 16 ]

Chapter 1

Enough justification. Now, to develop an application, you will need an SDK for the
runtime environment that you are targeting, which happens to be Python in our case.
To obtain the Python SDK, visit https://developers.google.com/appengine/
downloads. From the download page, select and download the SDK version for
your platform. Now let's examine installation process for each platform in detail.

Installing an SDK on Linux


The installation of the Linux SDK is quite simple. It is just a matter of downloading
and unzipping the SDK. Besides this, you have to ensure that you have Python 2.7.x
installed, which usually is the case with most Linux distributions these days.
To check whether you have Python, open a terminal and type the following command:
$ python --version
Python 2.7.6

If you get a response that states that the command was not found or your version
number shows something other than 2.7.x (the least significant digit isn't important
here), then you'll have to install Python. For Ubuntu and Debian systems, it will
be simple:
$ sudo apt-get install python2.7

Once you're done with the preceding process, you just have to unzip the SDK
contents into a directory such as /home/mohsin/sdks.
The best way to work with SDK is to add it to system's PATH
environment variable. This way, all the command line tools
would be available from everywhere. To do that, you can modify
the PATH like this:
$ export PATH=$PATH:/path/to/sdk

This change would stay as long as the shell is active to better


you add the above like in your .bashrc which is located at
~/.bashrc.
So as you can see, the installation on Linux is pretty simple and
involves simply uncompressing the SDK contents and optionally
adjusting the system path.

[ 17 ]

Understanding the Runtime Environment

Installing an SDK on Mac


The requirements for Python presence on the system remain the same, and Mac OS X
comes with Python. So, this is already satisfied and we're done with it. So now, drag
the .dmg file to Applications as you'd install any normal app for Mac and perform
the following steps:
1. In Finder, browse Go | Applications. This shall open the
Applications folder.
2. Double-click on the .dmg file that you just downloaded and drag the
GoogleAppEngineLauncher icon to the Applications folder.
3. Now, double-click on the Launcher icon that you just dragged to the
Applications folder.
4. When you're prompted for to make the symlinks command, click on OK
because Launcher alone is just a useful utility that is used to run the App
Engine apps locally, but its GUI lacks many of the features and commands
that are otherwise available in the SDK. So, making symlinks will let you
access them on a terminal from anywhere.
5. Your SDK contents will be at /usr/local/google_appengine.
Now, you're done with the installation.

Installing an SDK on Windows


A little unwarranted rantWindows is usually not a very good platform for
development if you want to use open source tool chains, because from Ruby to
Python and node.js, everything is developed, tested, and usually targeted for the
*nix systems. This is why they might not work out of the box on Windows. On this
note, the Python SDK for App Engine is available for Windows, and it requires a
Python installation too, which can be downloaded from http://www.python.org.
Download the .msi installer for Python 2.7.x (where x is whatever latest minor
version which right now is 10) and follow the instructions. You will have everything
right there required to run Python programs. Next, download the Google App
Engine SDK for Windows and install that too and you are done.

Writing a simple app


Now that we have a good overview of how App Engine scales, available runtimes,
and the services that are at our disposal, it's time to do something real and write our
first app.

[ 18 ]

Chapter 1

We will write a simple app that will print all the environment variables. Before you
write any code, you'll need to create the app on Google App Engine. If you don't do
this, you can still test and run the applications locally, but to deploy, you have to
create an app on Google App Engine. To do this, navigate to http://appengine.
google.com. Here, you'll be asked to log in using your Gmail credentials. Once
you've logged in, you will have to click on Create a Project from the drop down
menu as shown below:

Creating a new project from Google Developer Console.

Once you click this, you'll be presented with this dialog:

Popup to enter information for your new project

[ 19 ]

Understanding the Runtime Environment

In its most basic form, the pop-up would only contain the name of the project, but we
have expanded all the options to demonstrate. The first thing is the Project name and
this can be anything you like it to be. The second thing is the Project ID. This is the
ID that you will use in your app.yaml file. This ID must be unique across all the App
Engine applications and it is automatically generated for you, but you can specify
your own as well. If you specify your own, you will be warned if it is not unique and
you won't be able to proceed.
The next advanced option is about the location that your app would be served
from. By default, all the applications would be hosted from the data centers located
in USA, but you can select the European ones. You should select the European
data center if most of the user base is close to or is in Europe. For example, if we're
building an app for which we expect most of the traffic from Asia, Middle-east, or
Europe, then probably it would make more sense to go for European data center.
Once done, left-click on Compute | App Engine | Dashboard. When presented with
the dialog box, select Try App Engine:

You'll be greeted with this dialog on selecting Google App Engine.

Downloading the example code


You can download the example code fi les for all Packt books you have
purchased from your account at http://www.packtpub.com. If you
purchased this book elsewhere, you can visit http://www.packtpub.
com/support and register to have the fi les e-mailed directly to you.

[ 20 ]

Chapter 1

And finally, you'll see the following screen:

The welcome page shows steps to deploy a sample application.

This welcome page appears because you have no application deployed as yet.
Once deployed, you'll see a dashboard, which we'll see in a while.
You can follow the above instructions from the welcome page, if you want to
deploy a sample application as shown in the preceding screenshot, but for our
purpose, we will deploy our own application. To deploy our own app, all we
need is the project ID for which you can click on Home on the left side, which
will show the following page:

Your newly created project. All we need is the Project ID

[ 21 ]

Understanding the Runtime Environment

We only need the Project ID from the first box on the top-left, which we will enter
in app.yaml for application directive and then we're all good. For example, in this
chapter, we used mgae-01 as the Project ID and that's what we are using. Because
application IDs must be unique across all the App Engine applications, therefore,
you cannot use this ID while deploying your own application and you will have to
select something else.
Once you have deployed the app, your dashboard (accessible from Compute |
App Engine | Dashboard) will look like this, instead of the welcome page that
we saw earlier:

The application dashboard of a deployed application

Now that we are done with the basic setup, we will write the code and run and
test it locally.
Create a directory somewhere. Create a file named app.yaml and enter the
following into it:
application: mgae-01
version: 1
runtime: python27
api_version: 1
threadsafe: false
handlers:
- url: /.*
script: main.py
[ 22 ]

Chapter 1

This app.yaml file is what defines your application. Application is the unique ID
that we discussed. The version is your version of the app. You can have multiple
versions of the same app. As you change this string, it will be considered a new
version and would be deployed as a new version, whereas the previous version will
be retained on the App Engine servers. You can switch to a previous version from
the dashboard whenever you like. Besides this, you can also split the traffic between
the various versions of an application.
The next attribute is the runtime. We have many choices here, such as go if we
want to have our app in the Go programming language. Previously for Python, we
had choice of either Python 2.7 or Python 2.5. However, support for the Python 2.5
runtime environment is deprecated, and new apps cannot be created with Python 2.5
since January 2014.
Next comes the api_version. This indicates the version of the system services
that you'd like to use. There is only one version of all the available system APIs (the
ones that we discussed under runtime services), but in case Google does release any
incompatible changes to the services, this API version number will be incremented.
Thus, you will still be able to maintain the apps that you developed earlier, and you
can opt for a newer version of APIs if you want to use them in newer applications or
upgrade your existing applications to use newer versions.
Next comes the thread safe thing. Here, you indicate whether your application is
thread-safe or not. As a rule of thumb, if your code does not write to any global
variables or compute them on the fly to populate their values for later reference, your
app is thread-safe. Hence, multiple requests can be handed over to your App Engine
instance. Otherwise, you'll be handed over a single request at a time, which you'll
have to finish processing before you can get the next request to be processed.
Multithreading was not available for the Python 2.5 environment because it worked
via CGI, but Python 2.7 supports WSGI, which allows concurrent requests. However,
this particular app uses the 2.7 runtime environment, but it is not a WSGI app. All
of this might seem Greek to you for now, but we shall discuss CGI, WSGI, and
concurrent requests in detail in the next chapter.
Next comes the handlers section. Here, we list URLs as regular expressions and
state what has to be done with them. They might be handled by a script or mapped
to a static directory. We'll discuss the latter case in the next chapter, which will let us
serve static application resources, such as images, styles, and scripts. An important
thing that you should note is that the URLs in the list are always checked in the order
in which they are defined, and as soon as the first match is found, the listed action
is taken. Here, are mentioning tell that whatever URL we get, simply execute the
Python script. This is the CGI way of doing things. WSGI will be slightly different,
and we'll examine this in detail later.
[ 23 ]

Understanding the Runtime Environment

So, this was the explanation of the app.yaml, which describes the contents and
details of your app. Next comes the actual script, which will generate the output for
the web page. Create a main.py file in the same directory as that of app.yaml and
enter the following code:
import os
print 'Content-Type: text/plain'
print ''
print "ENVIRONMENT VARIABLES"
print "======================\n"
for key in os.environ:
print key, ": ", os.environ[key]

Now, let's examine this. This is actually a CGI script. First, we imported a standard
Python module. Next, we wrote to the standard output (stdout), and the first
statement actually is writing an HTTP header, which indicated that we are
generating plain text.
Next, the print statement printed a blank line because the HTTP headers are
supposed to be separated by a blank line from the HTTP body.
Next, we actually iterated over all the environment variables and printed them to
stdout, which in turn will be sent to the browser. With that, we're done with our
example application.
Now that we understand how it works, let's run it locally by executing the
following command:
$ ~/sdks/google_appengine/dev_appserver.py ~/Projects/mgae/ch01/hello/

Here, ~/Project/mgae/ch01/hello is the directory that contains all the previously


mentioned application files. Now, when you point your browser to http://
localhost:8080, you'll find a list of environment variables printed. Hit it with any
URL, such as http://localhost:8080/hello, and you'll find the same output
except for a few environment variables, which might have a different value.

Deploying
Let's deploy the application to the cloud, as follows:
$ ~/sdks/google_appengine/appcfg.py update ~/Projects/mgae/ch01/hello/
--oauth2
10:26 PM Application: mgae-01; version: 1
10:26 PM Host: appengine.google.com
[ 24 ]

Chapter 1
10:26 PM
Starting update of app: mgae-01, version: 1
10:26 PM Getting current resource limits.
Email: [email protected]
Password for [email protected]:
10:26 PM Scanning files on local disk.
10:26 PM Cloning 2 application files.
10:27 PM Uploading 2 files and blobs.
10:27 PM Uploaded 2 files and blobs
10:27 PM Compilation starting.
10:27 PM Compilation completed.
10:27 PM Starting deployment.
10:27 PM Checking if deployment succeeded.
10:27 PM Deployment successful.
10:27 PM Checking if updated app version is serving.
10:27 PM Completed update of app: mgae-01, version: 1

This that indicates that our app is deployed and ready to sever. Navigate your
browser to http://yourappid.appspot.com and you will see something like this:
REQUEST_ID_HASH : FCD253ED
HTTP_X_APPENGINE_COUNTRY : AE
SERVER_SOFTWARE : Google App Engine/1.9.11
SCRIPT_NAME :
HTTP_X_APPENGINE_CITYLATLONG : 0.000000,0.000000
DEFAULT_VERSION_HOSTNAME : mgae-01.appspot.com
APPENGINE_RUNTIME : python27
INSTANCE_ID : 00c61b117c09cf94de8a5822633c28f2f0e85efe
PATH_TRANSLATED : /base/data/home/apps/s~mgae-01/1.378918986084593129/
main.pyc
REQUEST_LOG_ID :
54230d4200ff0b7779fcd253ed0001737e6d6761652d3031000131000100
HTTP_X_APPENGINE_REGION : ?
USER_IS_ADMIN : 0
CURRENT_MODULE_ID : default
CURRENT_VERSION_ID : 1.378918986084593129
USER_ORGANIZATION :
APPLICATION_ID : s~mgae-01
USER_EMAIL :
DATACENTER : us2
USER_ID :
HTTP_X_APPENGINE_CITY : ?
AUTH_DOMAIN : gmail.com
USER_NICKNAME :

The --oauth2 option will open the browser, where you will have to enter your
Google account credentials. You can do without --oauth2. In this case, you will be
asked for your email and password on the command shell, but you'll also get a notice
that states that this mode of authentication is deprecated.
[ 25 ]

Understanding the Runtime Environment

Let's examine a few interesting environment variables that are set by Google App
Engine. REQUEST_ID_HASH and REQUEST_LOG_ID are set by App Engine to uniquely
identify this request. That's the request ID that we talked about in the section about
how scaling works. The APPENGINE_RUNTIME indicates the runtime environment
that this app is running on. There is a DATACENTER header that is set to us2,
which indicates that our app is being executed in the US data centers. Then, we
have INSTANCE_ID, which is the unique ID that is assigned to the instance
handling this request.
Then, some user-specific headers such has USER_IS_ADMIN, USER_EMAIL, USER_ID,
USER_NICKNAME, and AUTH_DOMAIN are set by the User service that we discussed in

the services section. If a user had logged in, these headers will have their email, ID,
and nickname as values.
These headers are added by Google App Engine, and a feature of the environment in
which your code executes. So that's all, folks!

Summary
This chapter described how the App Engine works in terms of scaling and the
anatomy of a typical App Engine application. We then turned our attention towards
the services that are at the disposal of an App Engine application. We had a brief
overview of each one of these services. Next, we moved towards writing a simple
web app that would print all the environment variables. Next, we ran it locally and
deployed it on the cloud to examine its output and noted a few interesting headers
that are added by App Engine.
This understanding of the environment is essential towards mastering Google App
Engine. By now, you have a pretty good understanding of the environment under
which your code executes. In the next chapter, we are going to examine request
handling in detail and check out the options that we have while serving requests.

[ 26 ]

Get more information Mastering Google App Engine

Where to buy this book


You can buy Mastering Google App Engine from the Packt Publishing website.
Alternatively, you can buy the book from Amazon, BN.com, Computer Manuals and most internet
book retailers.
Click here for ordering and shipping details.

www.PacktPub.com

Stay Connected:

You might also like