Architect Design and Process Workbook
Architect Design and Process Workbook
Google Cloud
Architect Design and
Process Workbook
Proprietary + Confidential
ClickTravel
“
Your ticket to the clouds!
Proprietary + Confidential
1b. ClickTravel
Brief description
● Travelers can search and book travel (hotels, flights, trains, cars)
● Pricing will be individualized based on customer preferences and demand
● Strong social media integration with reviews, posts, and analytics
● Suppliers (airlines, hotels, etc.) can upload inventory
Karen
Karen is a busy businesswoman who likes to take luxury weekend breaks, often booked
at the last minute. A typical booking comprises a hotel and flight. Recommendations
play a major role in the choice Karen makes, as does customer feedback. Karen likes to
perform all operations from her phone.
Here are a couple of examples of personas for our online travel portal.
Karen is a busy businesswoman who likes to take luxury weekend breaks, often
booked at the last minute. A typical booking comprises a hotel and flight.
Recommendations play a major role in the choice Karen makes, as does customer
feedback. Karen likes to perform all operations from her phone.
Proprietary + Confidential
Andrew
Andrew is a student who likes to travel home to visit parents and also takes vacations
twice yearly. His primary concern is cost, and he will always book the lowest price travel
regardless of convenience. Andrew has no loyalty and will use whichever retailer can
provide the best deal.
Andrew is a student who likes to travel home to visit parents and also takes vacations
twice yearly. His primary concern is cost, and he will always book the lowest price
travel regardless of convenience. Andrew has no loyalty and will use whichever
retailer can provide the best deal.
Proprietary + Confidential
As a hotel operator, I want to bulk supply hotel inventory so that ClickTravel can sell it on
my behalf.
Proprietary + Confidential
As a ClickTravel manager, I want to analyze the sales performance data of all our
suppliers so that I can identify poor performers and help them improve.
Proprietary + Confidential
Search hotel and flight 95% of requests will complete in Time to last byte GET requests measured every
under 200 ms 15 seconds aggregated per 5 minutes
Supply hotel inventory Error rate of < 0.00001% Upload errors measured as a percentage of bulk
uploads per day by custom metric
Supply hotel Inventory Available 99.9% Fraction of 200 vs 500 HTTP responses from API
endpoint measured per month
Analyze sales performance 95% of queries will complete in Time to last byte GET requests measured every
under 10s 60 seconds aggregated per 10 minutes
Here are some example SLOs and SLIs for out travel portal application. Notice that
the SLI describes what we are going to measure and how: for example, the “Fraction
of 200 vs 500 HTTP responses from API endpoint measured per month.” This
example is a way of measuring availability.
The SLO represents the goal we are trying to achieve for a given SLI. For example,
“Available 99.95%” of the time.”
Proprietary + Confidential
Inventory
Service
Search
Service
Web Inventory
UI Database
Orders
Service Reporting
Mobile Analytics Service
UI Service
Orders
Database
Auth
Service Data
Warehouse
Here is a sample diagram depicting the microservices of our online travel portal. I
suppose we could lay this out many different ways. There isn’t really one and only one
right way to design an application.
Notice, we have separate services for our web and mobile UIs. There’s a shared
authentication service and we have microservices for search, orders, inventory,
analytics and reporting. Remember, each of these services will be deployed as a
separate application. Where possible we want stateless services, but the orders and
inventory services will need databases, and the analytics service will provide a data
warehouse.
This might make a good starting point, and we could adjust as needed when we
starting implementing the application.
Proprietary + Confidential
Here’s an example for our online travel portal. Obviously, our API would be larger than
this, but in a way the APIs are all more of the same. Each service manages and
makes available some collection of data. For any collection of data there are a
handful of typical operations we do with that data.
This is similar to Google Cloud APIs. For example in Google Cloud, we have a service
called Compute Engine, which is used to create and manage virtual machines,
networks, and the like. The Compute Engine API has collections like instances,
instanceGroups, networks, subnetworks, and many more. For each collection, various
methods are used to manage the data.
Proprietary + Confidential
Strong or Amount of
Structured or Read only or
Service SQL or NoSQL Eventual Data (MB, GB,
Unstructured Read/Write
Consistency TB, PB, ExB)
Here’s an example for our online travel portal, ClickTravel. We focussed on the
inventory, inventory uploads, ordering, and analytics services. As you can see, each
of these services has different requirements that might result in choosing different
Google Cloud services.
Proprietary + Confidential
Service
Persistent Cloud Cloud Firestore Cloud Cloud BigQuery
Disk Storage SQL Bigtable Spanner
Inventory X
Inventory
X
uploads
Orders X
Analytics X
For the inventory service we will use Cloud Storage for the raw inventory uploads.
Suppliers will upload their inventory as JSON data stored in text files. That inventory
will then be imported into a Firestore database.
The orders service will store its data in a relational database running in Cloud SQL.
The analytics service will aggregate data from various sources into a data warehouse,
for which we’ll use BigQuery.
Proprietary + Confidential
Internet facing
Service HTTP TCP UDP Multi-Regional?
or Internal only
Inventory Internal X No
Orders Internal X No
The inventory and orders service are internal and regional using TCP. The other
services need to be facing the internet using HTTP. We decided to deploy these to
multiple regions for lower latency, higher performance, and high availability to our
users who are in multiple countries around the world.
Proprietary + Confidential
Service
HTTP TCP UDP
Search X
Inventory X
Analytics X
Web UI X
Orders X
Based on those network characteristics, we chose the global HTTP load balancer for
our public-facing services and the internal TCP load balancer for our internal-facing
services.
Proprietary + Confidential
Inventory
Service
Search
Service
TCP Inventory
Database
HTTPS
Global HTTP
Load Balancer Web UI
Orders Analytics
Service Service
Auth Reporting
Service
VPN Service
TCP Orders Data
Database Warehouse
Third-party VPC LAN
User traffic from mobile and web will first be authenticated using a third-party service.
Then a Global HTTP Load Balancer directs traffic to our public facing Search and web
UI services.
From there, regional TCP load balancers direct traffic to the internal inventory and
orders services.
The analytics service could leverage BigQuery as the data warehouse with an
on-prem reporting service that accesses the analytics service over a VPN. This might
be good enough to start, and we could refine this once we start implementing it.
Proprietary + Confidential
europe-west2 us-central-1
us-central1 us-central1-a
HTTPS
TCP Load us-central1-b
HTTP Global Balancer
us-central1-b
Load Balancer Orders Inventory Firestore
Failover
UI Service Service
For our online travel application, ClickTravel, I’m assuming that this is an American
company, but that I have a large group of customers in Europe.
I could also deploy the backends globally but if I’m trying to optimize cost, I could start
by just deploying those in us-central1.This will create latency for our European users
but I can always revisit this later and have a similar backend in europe-west2.
To ensure high availability, I’ve decided to deploy the Orders and Inventory services to
multiple zones. Because the Analytics service is not customer-facing, I can save
some money by deploying it to a single zone. I again have a failover Cloud SQL
database, and the Firestore database and BigQuery data warehouse are
multi-regional, so I don’t need to worry about a failover for those.
The Cloud SQL database needs a failover for high availability. Because BigQuery and
Firestore are managed, we don’t have to worry about that.
In case of a disaster, I’ll keep backups in a multi-regional Cloud Storage bucket. That
way if there is a regional outage, I can restore in another region.
Proprietary + Confidential
us-central-1
us-central1-a
Here’s an example of some disaster scenarios for our online travel portal, ClickTravel.
Each of our services uses different database services and has different objectives and
priorities. All of that affects how we design our disaster recovery plans.
Proprietary + Confidential
Our Analytics service in BigQuery had the lowest priority; therefore, we should be able
to re-import data to rebuild analytics tables if a user deletes them.
Our Orders service can’t tolerate any data loss and has to be up and running almost
immediately. For this we need a failover replica in addition to binary logging and
automated backups.
Our Inventory service uses Firestore, and for that we can implement daily automated
backups to a multi-regional Cloud Storage bucket. Cloud Functions and Cloud
Scheduler can help with the recovery procedure.
Proprietary + Confidential
Cloud VPN
Google Cloud Armor Firewall Rules:
Allow HTTPS from 0.0.0.0/0
Allow SSH from known sources Google Cloud Services
HTTPS
Subnets:
HTTP us-central1 Firestore
Global Load us-east1 Cloud
Balancer europe-west2 SQL
Private Google access BigQuery
First, I configured Google Cloud Armor on a global HTTP Load Balancer to block any
denied IP addresses. My custom VPC network has subnets in us-central1 for my
American customers, and a backup subnet in us-east1 and a subnet in europe-west2
for my European customers.
My firewall rules only allow SSH from known sources, and although I allow HTTPS
from anywhere, I can always deny IP addresses with Google Cloud Armor at the edge
of Google Cloud’s network. I also configured Cloud VPN tunnels to securely
communicate with my on-premises network for my reporting service.
Now, while my load balancer needs a public IP address, I can secure my backend
services by creating them without external IP addresses. In order for those instances
to communicate with the Google Cloud database services, I enable Private Google
Access. This enables the inventory, orders, and analytics services’ traffic to remain
private, while reducing my networking costs.
Proprietary + Confidential
Here’s a rough estimate for the database applications of my online travel portal,
ClickTravel.
I adjusted my orders database to include a failover replica for high availability and
came up with some high-level estimates for my other services. My inventory service
uses Cloud Storage to store JSON data stored in text files. Because this is my most
expensive service, I might want to reconsider the storage class or configure object
lifecycle management.
Again, this is just an example, and your costs would depend on your case study.
Firestore based on US multi regional, 200,0000 reads per day 10000 writes 0 deletes
and 1,000 GB stored