0% found this document useful (0 votes)
36 views335 pages

106105223

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views335 pages

106105223

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 335

INDEX

S. No Topic Page No
Week 1
1 Introduction to Cloud 1
2 Cloud Computing 4
3 Cloud vs Traditional Architecture 6
4 Iaas, PaaS and SaaS 9
5 Google Cloud Architecture 12
6 Cloud Computing Recap Quiz 19
7 Summary - Cloud Computing 20
8 Introduction - Start with a Solid Platform 22
9 The GCP Console 25
10 Understanding Projects 29
Week 2
11 Billing in GCP 36
12 Install and Configure Cloud SDK 41
13 Use Cloud Shell [With Labs] 44
14 GCP APIs 48
Week 3
15 Cloud Console Mobile App 53
16 Recap Quiz - Start with a Solid Foundation 54
17 Introduction 55
18 Compute Options in the Cloud 58
19 Exploring IaaS with Compute Engine [With Lab] 59
Week 4
20 Configuring Elastic Apps with Autoscaling 67
21 Exploring PaaS with App Engine [With Lab] 69
22 Event Driven Programs with Cloud Functions [With Lab] 77
23 Containerizing and Orchestrating Apps with GKE 80
24 Summary 88
Week 5
25 Introduction 90
26 Storage Options in the Cloud 93
27 Structured and Unstructured Storage in the Cloud 96
28 Unstructured Storage using Cloud Storage [With Lab] 98
Week 6
29 SQL Managed Services 103
30 Exploring Cloud SQL [With Lab] 107
31 Cloud Spanner as a Managed Service 111
32 NoSQL Managed Services Options 115
Week 7
33 Cloud Datastore a NoSQL Document Store [With Lab] 116
34 Cloud Bigtable as a NoSQL Option 119
35 Summary 125
36 Introduction to API 127
Week 8
37 The Purpose of APIs 129
38 Cloud Endpoints [With Lab] 132
39 Using Apigee 137
40 Managed Message Services 139
Week 9
41 Cloud Pub/Sub [With Lab] 144
42 Recap Quiz - There's an API for that! 151
43 Introduction - Cloud Security 153
44 Introduction to security in the cloud 156
45 Understanding the shared security model 160
Week 10
46 Explore encryption options 162
47 Understand authentication and authorization [With Lab] 167
48 Identify best practices for authorization 179
49 Recap Quiz - Security 181
50 Summary - Security 183
51 Introduction 185
Week 11
52 Intro to Networking in the Cloud 188
53 Defining a Virtual Private Cloud 190
54 Public and Private IP Address Basics 194
55 Googles Network Architecture 197
56 Routes and Firewall Rules in the Cloud [With Lab] 201
57 Multiple VPC Networks [With Lab] 208
58 Building Hybrid Clouds 213
59 Different Options for Load Balancing [With Labs] 219
Week 12
60 Recap Quiz 220
61 Summary 222
62 Introduction - Let Google keep an eye on things 224
63 Introduction to IaC 226
64 Cloud Deployment Manager 227
65 Monitoring and Managing Your Services, Apps, and Infra 231
66 Stackdriver [With Lab] 234
67 Recap Quiz - Let Google keep an eye on things 241
68 Summary - Let Google keep an eye on things 244
69 Introduction - You have the data, but what are you doing with it? 246
70 Intro to Big Data Managed Services in the Cloud 248
71 Leverage Big Data Operations with Cloud Dataproc [With Labs] 252
72 Build ETL Pipelines using Cloud Dataflow [With Labs] 260
73 BigQuery Googles Enterprise Data Warehouse 267
74 Recap Quiz - You have the data, but what are you doing with it? 276
75 Summary - You have the data, but what are you doing with it? 279
76 Introduction 280
77 Introduction to ML 283
78 ML and GCP 293
79 Building Bespoke ML models 299
Week 13
80 Cloud AutoML [With Lab] 312
81 Googles Pre-trained ML APIs [With Labs] 319
82 Recap Quiz 326
Week 14
83 Summary 328
Google Cloud Computing Foundation Course
Sowmya Kannan
Department of Computer Science
Indian Institute Technology Kharagpur

Lecture-1
Module Introduction

Hi, I am Soumya. Welcome to module one where you will find out the answer to the question.
So, what’s the cloud anyway?

(Refer Slide Time: 00:10)

From the course map, you can see that you are at the beginning of your journey through the
Google Cloud Computing foundations course.

1
(Refer Slide Time: 00:20)

The objective of this module is for you to be able to discuss what the cloud is and why it is a
technological and business game-changer. The specific learning objectives to achieve this
include you being able to discuss cloud computing. Compare and contrast physical, virtual, and
cloud architectures. Define Infrastructure as a Service, Platform as a Service, and Software as a
Service. And identify some of the advantages of leveraging the cloud.

(Refer Slide Time: 00:56)

2
This module starts by discussing the characteristics of cloud computing before looking at how
traditional architectures compared to the cloud. You will then look at the key differences
between Infrastructure as a Service, Platform as a Service, and Software as a Service together
with the GCP services that fall within these categories. The final topic will focus on Google-
specific offerings in the cloud. You will end the module with a short quiz and a module recap.

3
Google Cloud Computing Foundation Course
Sowmya Kannan
Department of Computer Science
Indian Institute Technology Kharagpur

Lecture-2
Cloud computing

So, let us go ahead and look at the characteristics of cloud computing. Cloud computing has 5
fundamental characteristics.

(Refer Slide Time: 00:12)

First, computing resources are on-demand and self-service. Cloud computing customers use an
automated interface and get the processing power, storage, and the network they need without
the typical complex configurations required when building physical servers. Second, resources
are accessible over a network from any location. Providers allocate resources to consumers from
a large pool allowing them to benefit from economies of scale.

The resources exist in multiple locations all over the world. You just have to decide the available
geographic location you wish to utilize. Resources are elastic, if you need more resources you
can get them rapidly and when you need less you can scale back. Finally, you pay only for what
you use or reserve as you go. If you stop using resources, you simply stop paying.

4
(Refer Slide Time: 01:17)

Consider the example of a city. Infrastructure is the basic underlying framework of facilities and
systems, such as transport, communications, power, water, fuel, and other essential services. The
people in the city are like users, and the cars and bikes and buildings in the city are like
applications. Everything that goes into creating and supporting those applications or buildings
for the users or citizens is the infrastructure.

The purpose of this course is to explore as efficiently and clearly as possible. The infrastructure
services provided by Google cloud platform or GCP. You will become familiar enough with the
infrastructure services to know what the services do and have a good grounding on how to use
them. By the end of this course, you will be sufficiently prepared to learn anything you need to
know to use GCP.

5
Google Cloud Computing Foundation Course
Sowmya Kannan
Department of Computer Science
Indian Institute Technology Kharagpur

Lecture-3
Cloud Versus Traditional Architecture

So, how do traditional architecture is compared to the cloud? This topic aims to answer that
question.
(Refer Slide Time: 00:08)

Cloud computing is essentially the continuation of a model where you can rent out computing
infrastructure and have it managed by dedicated professionals. Equinix and CenturyLink are two
of the largest data center providers in the US. They are exactly household names though. So why
are the likes of Amazon, Microsoft, and Google even in this business? In particular, why is
Google doing cloud? The concept of cloud computing began with collocation.

Instead of operating your own data center, you rented space in a colocation facility. This was the
first wave of outsourcing IT. With colocation, the transfer of ownership was minimal. You still
own the machines and you maintain them. Traditionally, colocation is not thought of as cloud
computing but it did begin the process of transferring IT infrastructure out of your organization.

6
Organizations saved money with colocation by not having to build the data center and all of the
associated services.

The colocation provider would simply rent out all of this to your organization. Next, cloud
computing involved virtualized data centers, virtual machines, and API's. Virtualization provides
elasticity, you automate infrastructure procurement instead of purchasing hardware. With
virtualization, you still maintain the infrastructure. It’s still a user-controlled, user-configured
environment. This is the same as an on-premises data center.

But now, the hardware is in a different location. The next wave of cloud computing was to a
fully automated elastic cloud. This involved a move from a user-maintained infrastructure to
automated services. In a fully automated environment, developers don’t think of individual
machines. The service automatically provisions and configures the infrastructure used to run
your applications. Google was uniquely positioned to propel organizations into this next wave of
cloud computing. But what does Google have to do with the cloud?
(Refer Slide Time: 02:34)

We believe that in the future every company regardless of size or industry will differentiate itself
from its competitors through technology. Largely, that technology will be in the form of
software. Create software is centered on data. Therefore, every company is or will become a data

7
company. Google cloud provides a wide variety of services for managing and getting value from
data at scale.
(Refer Slide Time: 03:09)

This image shows our data center in Homina, Finland. The facility is one of the most advanced
and efficient data centers in the Google fleet. Its cooling system which uses seawater from the
bay of Finland reduces energy use and is the first of its kind anywhere in the world. We are one
of the world's largest corporate purchasers of wind and solar energy. We’ve been 100% carbon
neutral since 2007. The virtual world is built on physical infrastructure and all those racks of
humming servers use vast amounts of energy.

Together all existing data centers use roughly 2% of the world's electricity. So, we work to make
data centers run as efficiently as possible. Our data centers were the first to achieve ISO 14001
certification, a standard that maps out a framework for improving resource efficiency and
reducing waste.

8
Google Cloud Computing Foundation Course
Sowmya Kannan
Department of Computer Science
Indian Institute Technology Kharagpur

Lecture-4
Iaas PaaS and SaaS

This topic considers the key differences between Infrastructure-as-a-Service, Platform-as-a-


Service, and Software-as-a-Service together with the GCP services that fall within these
categories.

(Refer Slide Time: 00:17)

With Infrastructure-as-a-Service, the service provides the underlying architecture for you to run
servers. The resources to run are provided but it is up to the user to manage the operating system
and the application.

9
(Refer Slide Time: 00:34)

Platform as-a-Service takes it one step further. Now, the entire environment will be managed for
you, the user, and all that is required of you is to manage your applications. The operating system
layer will be managed as a part of the service.

(Refer Slide Time: 00:54)

For Software-as-a-Service, the infrastructure, platform, and software are managed for you. All
that’s required is that you bring your data to the system. A few commercial examples of SaaS
include SAP and Salesforce.

10
(Refer Slide Time: 01:11)

Virtualized data centers brought you Infrastructure-as-a-Service and Platform-as-a-Service


offerings. I as offerings provide you with raw compute, storage, and network organized in ways
familiar to you from physical and virtualized data centers. PaaS offerings, on the other hand,
bind your code to libraries that provide access to the infrastructure your application needs,
allowing you to focus on your application logic.

In the IaaS model, you pay for what you allocate. In the PaaS model, you pay for what you use.
As cloud computing has evolved the momentum has shifted towards managed infrastructure and
managed services.

11
Google Cloud Computing Foundation Course
Sowmya Kannan
Department of Computer Science
Indian Institute Technology Kharagpur

Lecture-5
Google Cloud Architecture

(Refer Slide Time: 00:07)

Next, you will focus on Google-specific offerings in the cloud. Google cloud platforms,
products, and services can be broadly categorized as compute, storage, big data, machine
learning networking, and operations or tools. Leveraging compute can include virtual machines
via Compute Engine, running Docker Containers in a managed platform using Google
Kubernetes Engine, deploying applications in a managed platform like App Engine, or running
event-based serverless code using Cloud Functions.

A variety of managed storage options are available as well. For unstructured storage, there is
cloud storage for managed relational databases, there is Cloud SQL or Cloud Spanner and for
NoSQL, there are options like Cloud Datastore or Cloud BigTable.

12
(Refer Slide Time: 01:00)

Managed services dealing with big data and machine learning are available as well.
(Refer Slide Time: 01:07)

Our data centers around the world are interconnected by the Google Network which by some
publicly available estimates carries as much as 40% of the world's Internet traffic today. This is
the largest network of its kind on earth and it continues to grow. It’s designed to provide the
highest possible throughput and the lowest possible latencies for applications. The network
interconnects with the public internet at more than 90 internet exchanges and more than 100
points of presence worldwide.

13
When an internet user sends traffic to a Google resource, we respond to the user’s request from
an edge network location that will provide the lowest delay or latency. Our edge caching network
places content close to end-users to minimize latency. Applications in GCP can take advantage
of this edge network too.

(Refer Slide Time: 02:14)

Google Cloud divides the world into 3 multi-regional areas - The Americas, Europe, and Asia
Pacific. Next, the three multi-regional areas are divided into regions which are independent
geographic areas on the same continent. Within a region, this fast network connectivity generally
round-trip network latency of under one millisecond that is at the 95th percentile. As you can
see, one of the regions in Europe is europe-west2 (London).

Finally, regions are divided into zones, which are deployment areas for GCP resources within a
focused geographic area. You can think of a zone as a datacenter within a region, although
strictly speaking, a zone isn’t necessarily a single datacenter. Compute engine virtual machine
instances reside within a specific zone. If that zone became unavailable, so would your virtual
machine and the workload running on it. Deploying applications across multiple zones enables
fault tolerance and high availability.

14
(Refer Slide Time: 03:39)

Behind the services provided by a Google cloud platform lies a huge range of GCP resources.
Physical assets such as physical servers and hard disk drives and virtual resources such as virtual
machines and containers. We manage these resources within our global datacenters. As of mid-
2019 GCP has expanded across 20 regions, 61 zones, and more than 200 countries and
territories. This expansion will continue.

(Refer Slide Time: 04:17)

When you take advantage of GCP services and resources you get to specify those resources
geographic locations. In many cases, you can also specify whether you are doing so on a Zonal

15
level, Regional level, or Multi-Regional level. Zonal resources operate within a single zone
which means that if a zone becomes unavailable, the resources won't be available either. A
simple example could be a compute engine virtual machine instance and it is persistent disks.
GKE has a component called a node and these are Zonal too.

(Refer Slide Time: 05:00)

Regional resources operate across multiple zones but still within the same region. An application
using these resources can be redundantly deployed to improve its availability. Finally, Global
resources can be managed across multiple regions. These resources can further improve the
availability of an application. Some examples of such resources include HTTP as load balancers
and virtual private cloud networks.

The GCP resources you use no matter where they reside must belong to a project. So, what is a
project? A project is the base level organizing entity for creating and using resources and
services and managing, billing API's, and permissions. Zones and regions physically organize the
GCP resources you use and projects logically organize them. Projects can be easily created,
managed, deleted, or even recovered from accidental deletions.

16
(Refer Slide Time: 06:17)

Each project is identified by a unique project ID and project number. You can name your project
and apply labels for filtering. These labels are changeable but the project ID and project number
remain fixed. Projects can belong to a folder, which is another grouping mechanism.

(Refer Slide Time: 06:43)

You should use folders to reflect the hierarchy of your enterprise and apply policies at the right
levels in your enterprise. You can nest folders inside folders. For example, you can have a folder
for each department and within each department’s folder, you can have subfolders for each of the

17
teams that make it up. Each team's projects belong to its folder. A single organization owns the
folders beneath it. An organization is the root node of a GCP resource hierarchy.

Although, you are not required to have an organization to use GCP. Organizations are very
useful. Organizations let you set policies that apply throughout your enterprise. Also, having an
organization is required to use folders. The GCP resource hierarchy helps you manage resources
across multiple departments and multiple teams within an organization. You can define a
hierarchy that creates trust boundaries and resource isolation.

For example, should members of your Human Resources team be able to delete running database
servers, and should your engineers be able to delete the database containing employee salaries?
Probably not, in either case. Cloud Identity and Access Management also called IAM lets you
fine-tune access control to all the GCP resources you use. You define IAM policies that control
user access to resources.

18
Google Cloud Computing Foundation Course
Sowmya Kannan
Department of Computer Science
Indian Institute Technology Kharagpur

Lecture-6
Quiz

(Refer Slide Time: 00:05)

Now, it is time to test what you have learned with a quiz. Which of the following is not a
fundamental attribute of the cloud? The answer is C. Select the PaaS resource from the available
options. App Engine is Google's Platform-as-a-Service. Which of the following is an example of
a zonal resource? Compute Engine is an example of a zonal resource.

19
Google Cloud Computing Foundation Course
Sowmya Kannan
Department of Computer Science
Indian Institute Technology Kharagpur

Lecture-7
Summary

That concludes the module. So, what is the cloud anyway? The simple answer to this question is
that the cloud or cloud computing refers to software and services that run on the Internet instead
of locally on a computer. The advantage of the cloud is that you can access your information on
any device with an internet connection. Another benefit of the cloud is that because the remote
servers handle much of the computing and storage. You do not necessarily need an expensive
high-end machine to get your work done. Let’s look at some of the key learning points from this
module.

(Refer Slide Time: 00:41)

At the start of this module, you learned that cloud computing has 5 fundamental characteristics.
On-demand self-service, broad network access, resource pooling, rapid elasticity, and measured
service. You also drew comparisons between a typical city infrastructure and an IT
infrastructure. You then considered how cloud computing is the continuation of a long term shift

20
in how computing resources are managed. Looking back at how things were in the 1980s, how
things changed through today and where it is heading next.

And Google's unique positioning to propel organizations into the next wave of cloud computing.
You consider the key differences between Infrastructure-as-a-Service which provides the
underlying architecture to run servers, Platform-as-a-Service which manages the environment for
the user leaving them to manage their applications and software-as-a-service which manages the
infrastructure, platform and software requiring users to only bring their data to the system.

In the final topic, you learned about the range of services that GCP offers in the areas of
compute, storage, big data, and machine learning.

(Refer Slide Time: 02:14)

You were also introduced to the scope of the Google Network as well as the concepts of regions
and zones. Lastly, you considered the relationship between GCP resources and projects.

21
Google Cloud Computing Foundation Course
Jimmy Iran
SMB Growth Program Manager
Google Cloud

Lecture-8
Module Introduction

(Refer Slide Time: 00:05)

Hi, I am Jimmy. Welcome to the module starts with a Solid Platform. In the first module, you
explored what the cloud is and why it is a technological and business game-changer. Start with a
solid platform is the second module of the Google Cloud Computing Foundations course, and we
will look at the different ways that you can interact with the Google Cloud Platform.

22
(Refer Slide Time: 00:20)

The objective of this module is for you to be able to describe the different ways a user can
interact with the Google Cloud Platform. The specific learning objectives to achieve this include
you being able to discuss how to navigate the GCP environment with the GCP console. Explain
the purpose and process of creating GCP projects. Explain how billing works and detail on how
to install and setup the cloud software development kit or cloud SDK.

(Refer Slide Time: 00:54)

23
You will also be able to describe how cloud shell can be used as an alternative way to access
your cloud resources directly from your browser. And how the services that makeup GCP offer
API’s so that code that you write can control them. Finally, you will be able to discuss how you
can manage services from a mobile app.

(Refer Slide Time: 01:15)

This module starts by exploring the GCP console followed by an introduction to projects and
billing. The topics that follow address installing and configuring the cloud SDK and how to use
cloud shell as an alternative to cloud SDK. You will then complete your first lab or you will take
your first steps with GCP by getting hands-on practice with the GCP console. You will then
complete a second lab where you will apply a range of G-cloud commands within Google Cloud
shell connect to storage services hosted on GCP.

The topics that follow introduced GCP API's as well as the cloud console mobile app before a
short quiz and module recap.

24
Google Cloud Computing Foundation Course
Jimmy Tran
SMB Growth Program Manager
Google Cloud

Lecture-09
The GCP Console

(Refer Slide Time: 00:01)

Let’s start with our first topic, the GCP Console.

(Refer Slide Time: 00:04)

25
There are four ways you can interact with Google Cloud Platform, and you will be introduced to
each in a term. The GCP Console provides a Web UI, the Cloud SDK, and Cloud Shell, which
provide a range of command-line tools, APIs for custom applications, and a Cloud Console
mobile app.

(Refer Slide Time: 00:25)

The GCP console provides a Web-based GUI for you to manage your GCP projects and
resources.

(Refer Slide Time: 00:32)

26
It serves as a centralized console for all of your project data and lets you execute common tasks
using simple mouse clicks. This way, there is no need to remember commands, and there is no
risk of introducing typing errors. When you use the GCP console, the resources you create are
done, so in a specific project. You can create multiple projects. So, you can use projects to
separate your work in whatever way makes sense for you.

For example, you might start a new project if you want to make sure only certain team members
can access the resources in that project. In contrast, all team members can continue to access
resources in another project. The GCP Console is also great for developers. Cloud source
repositories provide Git version control to support the collaborative development of any
application or service.

The Cloud SDK is a set of command-line tools for GCP. You can run these tools interactively or
in your automated scripts. Cloud Shell also provides you with command-line access to Cloud
resources directly from your browser, but without having to install the Cloud SDK or other tools
on your system. The utilities you need are always available up to date and fully authenticated
when you need them.

Cloud SDK and Cloud Shell will be discussed in more detail later in this module. The Cloud
SDK also includes client libraries that enable you to easily create and manage resources. GCP
client libraries expose APIs for two main purposes. App APIs provide access to services. Admin
APIs offer functionality for resource management.

27
(Refer Slide Time: 02:23)

From a browser, go to console.cloud.google.com. If you haven’t already logged into your Google
account, the system will prompt you to enter your credentials. After you log in, the GCP console
will display the details of your default project.

(Refer Slide Time: 02:40)

All GCP services are accessible through the simple menu button in the top left corner. You can
pin frequently use services to this menu. You have the opportunity to explore the GCP Console
later through a hands-on lab.

28
Google Cloud Computing Foundation Course
Jimmy Tran
SMB Growth Program Manager
Google Cloud

Lecture-10
Understanding Projects

(Refer Slide Time: 00:01)

All GCP resources that you allocate and use must belong to a GCP project. You will learn more
about projects on this topic.

29
(Refer Slide Time: 00:08)

Projects are the basis for enabling and using GCP services like managing APIs, enabling billing,
adding and removing collaborators, and enabling other Google services. Each project is a
separate account, and each resource belongs to exactly one. Projects can have different owners
and users. They are built separately, and they are managed separately.

(Refer Slide Time: 00:35)

30
Resource manager provides ways for you to manage your projects in GCP programmatically.
You can access the resource manager through an RPC API or REST API. With these APIs, you
can get a list of all projects associated with an account. Create new projects, update existing
projects, and delete projects. You can also undelete or recover projects that you want to restore.

(Refer Slide Time: 01:04)

Each GCP project has a name and project ID that you assign. The project ID is a permanent,
unchangeable identifier, and it has to be unique across GCP. While a project ID will be generated
automatically, you can edit it. However, this must be done while creating the new project as a
cannot be modified after work. In general, project IDs are made to be human-readable strings,
and you use them frequently to refer to projects.

31
(Refer Slide Time: 01:38)

On the other hand, project names are for your convenience, and you can change them. Note that
you cannot reuse the project name of a deleted project.

(Refer Slide Time: 01:50)

GCP also assigns each of your projects a unique project number, and you will see a display to
you in various contexts. But, using it is mostly outside the scope of this course. As you work
with GCP, you will use these identifiers in certain command lines and API calls.

32
(Refer Slide Time: 02:08)

To create a project, click on the name of the current project in the upper left portion of the
screen.

(Refer Slide Time: 02:14)

A list of all current projects will be displayed. Select a new project option on the right-hand side.

33
(Refer Slide Time: 02:23)

When the new project screen is displayed, give your project name.

(Refer Slide Time: 02:29)

You have the option to use the auto-generated project ID or create your own by clicking the
EDIT option. Remember, project IDs must be globally unique.

34
(Refer Slide Time: 02:41)

Select the appropriate billing account, organization, and location.

(Refer Slide Time: 02:47)

Click CREATE to create a new project.

35
Google Cloud Computing Foundation Course
Jimmy Iran
SMB Growth Program Manager
Google Cloud

Lecture-11
Billing in GCP

(Refer Slide Time: 00:06)

In the next topic, you will learn how billing works in GCP. Billing, it is no fun but a fact of life.
Let us learn more about it. Billing in GCP is set up at the GCP project level. When you define a
GCP project you link a billing account to it. This billing account is where you will configure all
your billing information including your payment option. You can link your billing account to 0
or more projects. Projects that you do not link to any billing account can only use free GCP
services.

Your billing account can be charged automatically and invoiced every month or at every
threshold limit. You can separate project billings by sending up billing sub accounts. Some GCP
customers who resell GCP services use sub accounts for each of their own clients.

36
(Refer Slide Time: 00:59)

You are probably thinking how can I make sure I do not accidentally run up a big GCP bill. GCP
provides 4 tools to help. Budgets and alerts, Billing export, Reports and Quotas. I will discuss
each of these in more detail next.

(Refer Slide Time: 01:19)

37
You can define budgets at the billing account level or at the project level. To be notified when
costs approach your budget limit, you can create an alert. For example, with a budget limit of
$20 and an alert set at 90%. You receiving notification alert when your expenses reach $18. You
can also set up a web hook to be called in response to an alert. This Webhook can control
automation based on billing alerts. For example, you could trigger a script to shutdown resources
when a billing alert occurs.

(Refer Slide Time: 01:55)

Billing export allows you to store detailed billing information in places where it is easy to
retrieve for external analysis, such as a BigQuery data set or Cloud Storage bucket.

38
(Refer Slide Time: 02:06)

And reports is a visual tool and the console that allows you to monitor expenditure based on a
project or services.

(Refer Slide Time: 02:15)

GCP also implements Quotas which limit unforeseen extra billing charges. Quotas are designed
to prevent the over consumption of resources because of an error or a malicious attack. Quotas

39
apply at the level of the GCP project. There are two types of quotas- Rate quotas and Allocation
quotas. Rate quotas reset after a specific time. For example, by default the Google Kubernetes
Engine service implements a quota of 1,000 calls to its API from each GCP project every 100
seconds.

Allocation quotas governed a number of resources you can have in your projects. For example,
depending on your region the number of GPUs permitted varies by type and region. You can
change quotas by requesting an increase from Google Cloud support. You can also use the
console to request a quota change. GCP quotas also protect the community of GCP users by
reducing the risk of unforeseen spikes in usage.

40
Google Cloud Computing Foundation Course
Jimmy Iran
SMB Growth Program Manager
Google Cloud

Lecture-12
Install and Configure Cloud SDK

The cloud SDK enables a user to run GCP command-line tools from a local desktop. Installing
and configuring the cloud SDK is the next topic.

(Refer Slide Time: 00:09)

The cloud SDK is a set of command-line tools that you can download and install on to a
computer of your choice and used to manage resources and applications hosted on GCP. The
‘gcloud’ CLI manages authentication, local configuration, developer workflow and interactions
with the cloud platform APIs. ‘gsutil’ provides command-line access to manage cloud storage
buckets and objects. ‘bq’ allows you to run queries and manipulate datasets tables and entities in
BigQuery through the command-line.

41
(Refer Slide Time: 00:49)

To install the cloud SDK to your desktop, go to cloud.google.com/sdk. Select the operating
system for your desktop and this will download the SDK for your operating system on your
desktop. Then follow the instructions specific to your operating system.

(Refer Slide Time: 01:10)

42
After the installation is complete, you will need to configure cloud SDK for your GCP
environment. Run the ‘gcloud init’ command, you will be prompted for information including
your login credentials, default project, and default region and zone.

43
Google Cloud Computing Foundation Course
Jimmy Iran
SMB Growth Program Manager
Google Cloud

Lecture-13
Use Cloud Shell

Cloud Shell allows a user to run the Google command-line without installing cloud SDK on a
desktop. In this topic, you will learn how to utilize Cloud Shell.

(Refer Slide Time: 00:11)

But what if it is not convenient to install the cloud SDK on the machine you are working with.
Cloud Shell provides command-line access to your cloud resources directly from within your
browser. Using Cloud Shell, you can manage your projects and resources easily without having
to install the cloud SDK or other tools locally. The cloud SDK command-line tools and other
utilities are always available up-to-date and fully authenticated.

So, how does Cloud Shell do that? It’s built using a Docker container running on a Compute
Engine virtual machine instance that you aren’t built for. Each GCP user has one. Your Cloud
Shell virtual machine is ephemeral, which means that it will be stopped whenever you stop using

44
it interactively and it will be restarted when you re-enter Cloud Shell. So, you wouldn’t want to
run a production web server in your Cloud Shell for example.

You also get 5 gigabytes of persistent disk storage that reattached for you every time a new
Cloud Shell session is started. It also provides web preview functionality and built-in
authorization for access to GCP console projects and resources. You can also use Cloud Shell to
perform other management tasks related to your projects and resources using either the g-cloud
command or other available tools.

(Refer Slide Time: 01:40)

To start Cloud Shell click on the activate cloud shell icon located at the upper right hand side of
the screen.

45
(Refer Slide Time: 01:49)

The Cloud Shell terminal will appear on the lower portion of the window. Options including
launching the Cloud Shell code editor and opening Cloud Shell in the new page can be
performed using the toolbar on the upper right corner of Cloud Shell.

(Refer Slide Time: 02:04)

46
The Cloud Shell code editor is a tool for editing files inside your Cloud Shell environment in real
time within the web browser. This tool is extremely convenient when working with code first
applications, or container-based workloads because you can edit files easily without the need to
download and upload changes. You can also use text editors from the Cloud Shell command
prompt.

47
Google Cloud Computing Foundation Course
Jimmy Iran
SMB Growth Program Manager
Google Cloud

Lecture-14
GCP APIs

(Refer Slide Time: 00:09)

Everything you do in GCP is done with APIs. This topic introduces APIs and discusses how they
are used. Let’s be precise about what an application programming interface or APIs. A software
service implementation can be complex and changeable. If other software services had to be
explicitly coded at that level of detail in order to use that surface, the result would be brittle and
error-prone. So, instead application developers structure the software they write so that presents
a clean, well-defined interface that stacks away needless detail.

And then they document that interface that's an application programming interface. The
underlying implementation can change as long as the interface doesn’t and other pieces of
software that use the API don’t have to know or care.

48
(Refer Slide Time: 00:59)

The services that makeup GCP offer APIs, so that code you write can control them. These APIs
can be enabled through the GCP console or what is called RESTful. In other words, they follow
the representations state transfer paradigm. In a broad sense, that means that your code can use
Google services in much the same way that web browsers talk to web servers. The APIs identify
resources in GCP with URLs. Your code can pass information to the APIs using JSON, which is
a very popular way of passing textual information over the web. And there is an open system
OAuth 2 for user login and access control.

GCP APIs also assist in helping you to control yours spend with most including daily quotas and
limits, where needed quotas and rates can be raised by request.

49
(Refer Slide Time: 02:08)

In addition to the Cloud SDK, you will also use client libraries than able you to easily create and
man resources. GCP client libraries expose APIs for two main purposes. App APIs provide
access to services and they are optimized for a supported language, such as NodeJS and Python.
Admin APIs are for functionality for resource management. For example, you can use admin
APIs if you want to build your automated tools.

The different application managed service options will be discussed in more detail later in the
course.

50
(Refer Slide Time: 02:51)

The GCP Console includes a tool called the APIs Explorer that helps you learn about the APIs
interactively. Let you see what APIs are available and in what versions. These APIs expect
parameters and documentation on them is built-in. You can try to APIs interactively, even with
user authentication. Suppose, you have explored an API and you are ready to build an
application that uses it. Do you have to start coding from scratch?

No. Google provides client libraries to take a lot of the drudgery out of the task of calling GCP
from your code.

51
(Refer Slide Time: 03:29)

In this example, the compute.instances.list method from the compute engine API will be tested.
Items listed in red are required inputs. When the method is run you will have to log in using
OAuth 2.0. Since REST APIs are HTTP based if the method runs correctly, you will receive a
200 message and the appropriate data will be displayed. At the project or the zone was entered
incorrectly you will get a 400 error and no data will be displayed.

52
Google Cloud Computing Foundation Course
Jimmy Iran
SMB Growth Program Manager
Google Cloud

Lecture-15
Cloud Console Mobile App

The cloud console mobile app provides another way for you to manage services running on GCP
directly from your mobile device. It’s a convenient resource that doesn’t cost anything extra.

(Refer Slide Time: 00:13)

The cloud console mobile app is available for iOS and Android and offers many capabilities. It
allows you to stay connected to the cloud and check billing, status, and critical issues. To see the
health of your service at a glance, you can create your custom dashboard showing key metrics
such as CPU usage, network usage, requests per second, server errors, and more. You can take
action to address issues directly from your device such as rolling back a bad release, stopping or
restarting a virtual machine, searching logs, or even connecting to a virtual machine via SSH.

The monitoring functionality allows you to view and respond to incidents, errors, and logging. If
you need to you can even access cloud shell to perform any G-cloud operation.

53
Google Cloud Computing Foundation Course
Jimmy Iran
SMB Growth Program Manager
Google Cloud

Lecture-16
Quiz

(Refer Slide Time: 00:05)

You have reached the end of the module. Complete the short quiz to test your understanding.
True or False: All GCP resources must be associated with the project. The answer is true.
Associating all resources with the project helps with billing and isolation. Which of the
following is a command-line tool that is part of the cloud SDK? C is the correct answer. The
gsutil command-line tool is used to work with cloud storage.

What command would you use to set up the default configuration of the cloud SDK? The gcloud
init command is used to set up the user, default project, and a default region and zone of the
SDK.

54
Google Cloud Computing Foundation Course
Sowmya Kannan
Google Cloud

Lecture-17
Module introduction

Hi, I am Sowmya. Welcome to the module used GCP to build your apps. In this module, you will
focus on leveraging GCP resources and server less managed services to build applications. So far
in this course, you have learned what GCP is and why you should have a solid platform before
beginning a GCP transfer. Now in this module, you will learn how to build apps directly in GCP.

(Refer Slide Time: 00:37)

The main objective of this module is to discover the different compute options in GCP. To
achieve this goal, you will need to meet the following learning objectives. Explore the role of
compute options in the cloud. Describe how to build and manage virtual machines, explain how
to build elastic applications using auto-scaling, and explore platform-as-a-service options by
leveraging the App Engine.

55
(Refer Slide Time: 01:10)

You will also be able to discuss how to build event-driven services utilizing cloud functions. And
explain how to containerize and orchestrate applications with Google Kubernetes Engine also
refer to as GKE.

(Refer Slide Time: 01:28)

56
This agenda shows the topics that make up this module. You will start by learning about
compute options in the cloud. You will then move on to finding out how to build and deploy
apps using compute engine, and how to create a virtual machine by completing a hands-on lab.
You will then discover how to configure elastic apps with auto-scaling and explore how App
Engine can run your applications without having you manage the infrastructure.

The second lab of the module will allow you to create a small App Engine application that
displays a short message. You will then move on to finding out about event-driven programs
with loud functions before completing another lab where you will create, deploy, and test a cloud
function using the Google cloud shell command line. You will finish the module learning about
containerizing and orchestrating apps with Google Kubernetes Engine before ending with a short
quiz and a recap of the key learning points from the module.

57
Google Cloud Computing Foundation Course
Sowmya Kannan
Google Cloud

Lecture-18
Compute Options in the Cloud

(Refer Slide Time: 00:06)

Let’s begin by learning about compute options in the cloud. GCP offers a variety of compute
services planning different usage options for general workload that require dedicated resources
for applications compute engine is a good option. If you are looking for a platform-as-a-service
app engine, it is a good option. Cloud functions offer a serverless option for triggering code to
run based on some kind of event.

And to run containers on a managed Kubernetes platform you can leverage Google Kubernetes
Engine. You will find out more about each of these compute services during this module.

58
Google Cloud Computing Foundation Course
Sowmya Kannan
Google Cloud

Lecture-19
Exploring IaaS with Compute Engine

(Refer Slide Time: 00:07)

Next, you will discover how to build and deploy applications with Compute Engine. Compute
Engine delivers virtual machines running in Google's innovative data centers and worldwide
fiber network. Compute Engine is ideal if you need complete control over the virtual machine
infrastructure need to make changes to the kernel such as providing your own network or
graphics drivers to squeeze out the last drop of performance.

Or if you need to run a software package that can’t easily be containerized or have existing VM
images to move to the cloud.

59
(Refer Slide Time: 00:43)

Compute Engine is a type of infrastructure as a service. It delivers scalable high-performance


virtual machines that run on Google's infrastructure. Compute Engine VMs boot quickly come
with persistent disk storage and deliver consistent performance. You can run any computing
workload on Compute Engine such as web server hosting, application hosting, and application
backends. Virtual servers are available in many configurations including predefined sizes.

Alternatively, there is the option to create custom machine types optimized for specific needs.
Compute Engine also allows users to run their choice of operating system. And while compute
engine allows users to run thousands of virtual CPUs in a system that has been designed to be
fast and offers strong performance consistency there is no upfront investment required. The
purpose of custom virtual machines is to ensure you can create virtual services with just enough
resources to work for your application.

For example, you want to run your application on a virtual machine but none of the predefined
versions will fit the resource footprint you require or your application needs to run on a specific
CPU architecture or GPUs are required to run your application. Custom virtual machines allow
for creating a perfect fit for your applications.

60
(Refer Slide Time: 02:28)

To meet your workload requirements there are different machine type options that you can
consider. For example, a higher proportion of memory to CPU, a higher proportion of CPU to
memory, or a blend of both through Google's standard configuration. Compute Engine offers
predefined machine types that you can use when you create an instance. A predefined machine
type has a preset number of virtual CPUs or vCPUs and amount of memory and is charged at a
set price.

You can choose from the general-purpose machine types, memory-optimized machine types, and
compute-optimized machine types. Predefined virtual machine configurations range from micro
instances of 2 vCPUs and 8 gigabytes of memory to memory optimized instances with up to 160
vCPUs and 3.75 terabytes of memory.

61
(Refer Slide Time: 03:42)

Compute Engine also allows you to create virtual machines with the vCPU and memory that
meet workload requirements. This has performance benefits and also reduces cost significantly.
One option is to select from predefined configurations. A general-purpose configuration provides
a balance between performance and memory, or you can optimize for memory, or performance.
You can create a machine type with as little as one vCPU and up to 80 vCPUs or any even
number of vCPUs in between.

You can configure up to 8 gigabytes of memory per vCPU. Alternatively, if none of the
predefined virtual machines fit your needs you have the option to create a custom virtual
machine. When you create a custom virtual machine you can choose the number of CPUs the
amount of memory required, the CPU architecture to leverage and the option of using GPUs.

62
(Refer Slide Time: 04:54)

Network storage up to 64 terabytes in size can be attached to VMs as persistent disks. Persistent
disks are the most common storage option due to their price performance and durability and can
be created in HDD or SSD formats. If a VM instance is terminated its persistent disk retains data
and can be attached to another instance. You can also take snapshots of your persistent disk and
create new persistent disks from that snapshot.

Compute Engine offers always encrypted local SSD block storage. Unlike standard persistent
disks, local SSDs are physically attached to the server hosting the VM instance offering very
high input-output operations per second and very low latency compared to persistent disks.
Predefined local SSD sizes up to 3 terabytes are available for any VM with at least one vCPU.

63
By default, most compute engine provided Linux images will automatically run an optimization
script that configures the instance for peak local SSD performance.

Standard persistent disk performance scales linearly up to the VM performance limits. A vCPU
counter 4 or more for your instance doesn’t limit the performance of standard persistent disks. A
vCPU count of less than 4 for an instance reduces the right limit for input-output operations per
second or IOPS because network egress limits are proportional to the vCPU count. The right
limit also depends on the size of input outputs or IOs.

For example, 16 kilobyte IOs consume more bandwidth than 8 kilobyte IOs at the same IOPS
level. Standard persistent disk IOPS and throughput performance increase linearly with the size
of the disk until it reaches set per instance limits. The IOPS performance of SSD persistent disks
depends on the number of vCPUs in the instance in addition to disk size. Lower core VMs have
lower right IOPS and throughput limits due to the network egress limitations on write
throughput.

SSD persistent disk performance scales linearly until it reaches either the limits of the volume or
the limits of each compute engine instance. SSD read bandwidth and IOPS consistency near the
maximum limits largely depends on network ingress utilization. Some variability is to be
expected especially for 16 kilobyte IOs near the maximum IOPS limits.

(Refer Slide Time: 08:18)

64
Networks connect compute engine instances to each other and the internet. Networks in the
cloud have a lot of similarities with physical networks. You can segment networks, use firewall
rules to restrict access to instances, and create static routes to forward traffic to specific
destinations. You can scale up applications on the compute engine from 0 to full throttle with
cloud load balancing. Distribute your load balanced compute resources in single or multiple
regions close to users and to meet your high availability requirements.

Sub-networks segment your cloud network IP space sub-network prefixes can be automatically
allocated or you can create a custom topology sub-networks and cloud load balancing are both
discussed in the module it helps to network. When you build a Compute Engine instance you use
a virtual network adapter which is part of the instance to connect a virtual machine to a network.
Much in the same way you would connect a physical server to a network.

For Compute Engine you can have up to 8 virtual adapters. Sub-networks and cloud load
balancing are both discussed in the module it helps to network.

(Refer Slide Time: 09:49)

65
All virtual machines are charged for one minute at boot time which is the minimum charge for a
VM. After that, per-second pricing begins meaning that you only pay for the compute time used.
Google offers sustained use discounts which automatically provide discounted prices for long-
running workloads without the need for signup fees or any upfront commitment. Predefined
machine types are discounted based on the percent of monthly use.

While custom machine types are discounted on a percent of total use. The GCP pricing calculator
is a great way to see pricing estimates based on the different configuration options that are
available and instances. salt-and in notes, persistent disks load balancing, and cloud TPUs.

66
Google Cloud Computing Foundation Course
Sowmya Kannan
Google Cloud

Lecture-20
Configuring Elastic Apps with Autoscaling

(Refer Slide Time: 00:07)

Your next topic looks at building elastic applications with auto-scaling. Let’s look at how auto-
scaling works. Auto scalar controls managed instance groups adding and removing instances
using policies. A policy includes the minimum and maximum number of replicas. In this
diagram, n is any number of instance replicas based on a template. The template requisitions
resource from Compute Engine identifies an OS image to boot and starts new VMs.

67
(Refer Slide Time: 00:42)

The percentage utilization that an additional VM contributes depends on the size of the group.
The fourth VM added to a group offers a 25% increase in capacity to the group. The tenth VM
added to a group only offers 10% more capacity even though the VMs are the same size. In this
example, the auto scalar is conservative and rounds up. In other words, it would prefer to start an
extra VM that isn’t needed than to possibly run out of capacity.

In this example, removing one who VM does not get close enough to the target of 75%.
Removing a second VM would exceed the target. Auto scalar behaves conservatively. So, it will
shut down one VM rather than two VMs. It would prefer underutilization over running out of
resources when they are needed.

68
Google Cloud Computing Foundation Course
Sowmya Kannan
Google Cloud

Lecture-21
Exploring PaaS with App Engine

(Refer Slide Time: 00:09)

Next, you will explore how App Engine can run applications without having your managed
infrastructure. App engine allows you to build highly scalable applications on a fully managed
serverless platform. App Engine is ideal if the time to market is highly valuable to you, and you
want to be able to focus on writing code without ever having to touch a server, cluster, or
infrastructure. It’s also ideal if you don’t want to worry about a pager going off for receiving 5xx
errors. App Engine allows you to have high availability apps without a complex architecture.

69
(Refer Slide Time: 00:48)

As a fully managed environment App Engine is a perfect example of a computing platform


provided as a service.

(Refer Slide Time: 00:09)

App Engine can save organizations time and cost in software application development by
eliminating the need to buy, build, and operate computer hardware and other infrastructure. This
includes no server management and no need to configure deployments. This allows engineering

70
teams to focus on creating high-value applications instead of no value operations work. You can
quickly build and deploy applications using the range of popular programming languages, like
Java, PHP, Node JS, Python, C#, .NET, Ruby, and Go, or you can bring your own language
runtime and frameworks.

App Engine allows you to manage resources from the command line, debug source code in
production, and run API back ends easily using industry-leading tools such as cloud SDK, cloud
source repositories, IntelliJ Idea, Visual Studio, and Portia. App Engine also automatically scales
depending on the application traffic and consumes resources only when code is running. This
allows cost to be kept to a minimum.

(Refer Slide Time: 02:21)

You can run your applications in App Engine using a standard flexible environment. You can
also choose simultaneously use both environments and allow your services to take advantage of
each environment individual benefits. The standard environment offers a fully managed
infrastructure for your application that can scale down to 0 if not in use. This means you are
stopping to use the service. However, your applications must conform to the top town
environment of App Engine in the standard.

71
Only the specific version of a few runtime or support. You cannot sign in to the system to make
changes. You cannot write to a persistent disk in the configuration of the environment is limited.
App Engine flexible runs your application in a Docker container environment. You can use any
http-based runtime. The virtual machines are exposed, allowing you to log into them and write to
persistent disks. However, the system will not scale to 0.

You’ll still pay for the service even if the users aren’t using the application because VM
instances in the flexible environment are Compute Engine virtual machines. There are far more
options for infrastructure customization. You’re also able to take advantage of a wide array of
CPU and memory configurations. In summary, if you just need a high-performance managed
infrastructure and can confirm to strict runtime limitations, then App Engine standard is a great
option.

If you need to use custom runtimes or if you need a less rigid environment but still want to
leverage a platform-as-a-service, then App Engine flexible would be a more suitable option.

(Refer Slide Time: 04:26)

The frontend is often critical to the user experience. To ensure consistent performance, a built-in
load balancer will distribute traffic to multiple frontends and scale the frontend as necessary. The

72
backend is for more intensive processing. This separation of function allows each part to scale as
needed. Note that the App Engine services modular, and this example shows a single service.
More complex architectures are possible.

(Refer Slide Time: 05:06)

When using App Engine, you also have multiple options to store application data including
caching through App Engine, Memcache, cloud storage for any objects upto 5 terabytes size,
cloud data storage for persistent low-latency memory for serving data to applications. Cloud
SQL, which is a relational database that can be run on a persistent, is greater than one terabyte in
size, and cloud Big Table is no SQL database for heavy read-write and then analysis.

73
(Refer Slide Time: 05:44)

The automatic scaling of App Engine allows you to meet any demand and load balancing
distribute load balance computer resources in single or multiple regions close to users to meet
high availability requirements.

(Refer Slide Time: 06:03)

74
App Engine allows you to easily host different versions of your app, which includes creating
development, test, staging, and production environment. Start driver gives you a powerful
application diagnostic to debug and monitor the health and performance of your app.

(Refer Slide Time: 06:22)

75
And you can leverage robust security tools like cloud security scanner. These services are
provided with high availability and guaranteed redundancy.

76
Google Cloud Computing Foundation Course
Sowmya Kannan
Google Cloud

Lecture-22
Event Driven Programs with Cloud Functions

Cloud function is a serverless code that allows you to run it based on certain events. In this topic,
you will learn how cloud functions work.

(Refer Slide Time: 00:12)

Developer agility comes from building systems composed of small independent units of
functionality focused on doing one thing well. Cloud functions let you build and deploy services
at the level of a single function, not at the level of entire application containers or VMs. Cloud
functions are ideal if you need to connect and extend cloud services and want to automate with
event-driven functions that respond to cloud events.

It is also ideal if you want to use open and familiar Node JS, Python, or Go without the need to
manage a server or runtime environment.

77
(Refer Slide Time: 01:02)

A cloud function provides a connective layer of logic that lets you write code to connect and
extend cloud services. You can listen and respond to a file upload to cloud storage, a log change
or an incoming message on a cloud pub/sub topic, and so on. Cloud functions have access to the
Google service account credential and are therefore seamlessly authenticated with the majority of
the GCP services such as cloud data store, cloud spanner, cloud translation API, and cloud vision
API.

Cloud events are things that happen in the cloud environment. These might be things like
changes to data in a database. Files added to a storage system, or a new virtual machine instance
created. Events occur whether or not users choose to respond to them. You can create a response
to an event with a trigger. A trigger is a declaration of interest in a certain event or set of events.
Binding a function to a trigger allows you to capture and act on the events.

A cloud function removes the work of managing servers, configuring software, updating
frameworks, and patching operating systems. We fully manage the software and infrastructure so
that you just add code. Furthermore, the provisioning of resources that happens automatically in

78
response to events. This means that a function can scale from a few invocations a day to many
millions of invocations without any additional work for you.

Events happen all the time within a system like file uploads to cloud storage, changes to database
records, requests to HTTP endpoints, and so on. By writing, code that runs in response to those
events cloud functions runs it while automatically managing any underlying infrastructure. Cloud
function connects and extends cloud services with code. So, you can treat them as building
blocks and adjust them as your needs change. You can also extend your application using a
broad ecosystem of third-party services and APIs.

(Refer Slide Time: 03:41)

A cloud service emits some kind of event. This can be a pub-sub message, a change to a cloud
storage object, or a webhook, for example. The event kicks off a cloud function. The function
can be written in Node JS, Python, or Go. The function can invoke other services and write back
the results. Building infrastructure is not required when leveraging cloud functions.

79
Google Cloud Computing Foundation Course
Sowmya Kannan
Google Cloud

Lecture-23
Containerizing and Orchestrating Apps with GKE

(Refer Slide Time: 00:06)

In this final topic, you will learn how to leverage Google Kubernetes Engine. You’ve already
discovered the spectrum between infrastructure-as-a-service and platform-as-a-service. And you
have learned about Compute Engine, which is the infrastructure-as-a-service offering of GCP
with access to servers file systems and networking. Now you will see an introduction to
containers and GKE, which is a hybrid that conceptually sits between the two.

It offers the managed infrastructure of infrastructure-as-a-service with the developer orientation


of platform-as-a-service. GKE is ideal for those that have been challenged when deploying or
maintaining a fleet of VMs, and it has been determined that containers are the solution.

80
(Refer Slide Time: 00:56)

It’s also ideal when organizations have containerized the workloads and need a system on which
to run and manage them and do not have dependencies on kernel changes or on a specific non-
Linux operating system. With GKE, there is no need to ever touch a server or infrastructure. So,
how does containerization work?

(Refer Slide Time: 01:26)

81
Infrastructure-as-a-service allows you to share compute resources with other developers by
virtualizing the hardware using virtual machines. Each developer can deploy their own operating
system access the hardware and build their applications in a self-contained environment with
access to their own runtimes and libraries as well as their own partitions of RAM, file systems,
networking, interfaces, and so on. You have your tools of choice on your own configurable
system.

So you can install your favorite runtime, web server, database or middleware configure the
underlying system resources such as disk space disk i/o or networking and build as you like. But,
flexibility comes with a cost. The smallest unit of compute is an app with its via the guest OS
may be large even gigabytes in size, and takes minutes to boot. As demand for your application
increases, you have to copy an entire VM and boot the guest OS for each instance of your app,
which can be slow and costly.

(Refer Slide Time: 02:44)

A platform-as-a-service provides hosted services and an environment that can scale workloads
independently. All you do is write your code in self-contained workloads that use these services
and include any dependent libraries. Workloads do not need to represent entire applications.
They are easier to decouple because they’re not tied to the underlying hardware operating system
or a lot of the software stacks that you use to manage.

82
(Refer Slide Time: 03:24)

As demand for your app increases, the platform scales your app seamlessly and independently by
workload and infrastructure. The scales rapidly and encourages you to build your applications as
decoupled microservices that run more efficiently, but you would not be able to fine-tune the
underlying architecture to save cost.

(Refer Slide Time: 03:52)

83
That is where containers come in. The idea of a container is to give you the independent
scalability of workloads in a platform-as-a-service and an abstraction layer of the operating
system and hardware in an infrastructure-as-a-service. It only requires a few system calls to
create, and it starts as quickly as a process. All you need on each host is an OS kernel that
supports containers and container runtime.

In a sense, you’re virtualizing the operating system. It scales like platform-as-a-service but gives
you nearly the same flexibility as an infrastructure-as-a-service. Containers provide an
abstraction layer of the hardware and operating system. An invisible box with configurable
access to isolated partitions of the file system RAM and networking as well as a fast start up with
only a few system calls.

(Refer Slide Time: 04:58)

Using a common host configuration, you can deploy hundreds of containers on a group of
servers. If you want to scale, for example, a web server, you can do so in seconds and deploy any
number of containers depending on the size of your workload on a single host or a group of
hosts. You will likely want to build your applications using lots of containers, each performing
their own function like microservices.

84
If you build them this way and connect them with network connections, you can make them
modular and deploy them easily and scale independently across a group of hosts. And the host
can scale up and down and start and stop containers as demand for your app changes or as hosts
fail. With a cluster, you can connect containers using network connections build code modularly,
deployed easily, and scaled containers, and hosts independently for maximum efficiency and
savings.

Kubernetes is an open source container orchestration tool; you can use to simplify the
management of containerized environments. You can install Kubernetes on a group of your own
manage service or run it as a hosted service in GCP on a cluster of managed Compute Engine
instances called Google Kubernetes Engine. Kubernetes makes it easy to orchestrate many
containers on many hosts scale them as microservices and deploy roll all and rollbacks.

Kubernetes was built by Google to run applications at scale. Kubernetes lets you install the
system on local servers in the cloud, manage container networking, and storage. Deploy rollouts
and rollbacks, and monitor and manage container and host health.

(Refer Slide Time: 07:03)

Just like shipping containers, the software container makes it easier for teams to package manage
and ship their code. They write software applications that run in a container. The container

85
provides the operating system needed to run their application. The container will run on any
container platform. This can save a lot of time and cost compared to running servers or virtual
machines. Like a virtual machine imitates a computer, a container imitates an operating system.

Everything at Google runs on containers. Gmail, web search, maps, MapReduce, batch processes
Google file system, Colossus even cloud functions our VMs in containers. Google launches over
2 billion containers per week. Docker is the tool that puts the application and everything it needs
in the container. Once the application is in a container, it can be moved anywhere that will run
Docker containers, any laptop server, or cloud provider.

This portability makes code easier to produce manage and troubleshoot and update. For service
providers, containers make it easy to develop code that can be ported to the customer and back.
Kubernetes is an open source container orchestration tool for managing a cluster of Docker
Linux containers as a single system. It can be run in the cloud and on-premises environments. It
is inspired and informed by Google's experiences and internal systems.

(Refer Slide Time: 08:58)

GKE is a managed environment for deploying containerized apps. It brings Google's latest
innovations in developer productivity, resource efficiency, automated operations, and open

86
source flexibility to accelerate time to market. GKE is a powerful cluster manager and
orchestration system for running Docker containers in Google cloud. GKE manages containers
automatically based on specifications such as CPU and memory.

It is built on the open source Kubernetes system, making it easy for users to orchestrate
container clusters or groups of containers because it is built on the open source Kubernetes
system it provides customers the flexibility to take advantage of on-premises hybrid or public
cloud infrastructure.

87
Google Cloud Computing Foundation Course
Sowmya Kannan
Google Cloud

Lecture-24
Summary

(Refer Slide Time: 00:11)

That concludes the module used GCP to build your apps. Here is a reminder of what you have
learned. You began by learning that there are four different computer options in the cloud to
choose from Compute Engine, App Engine, Cloud Functions, and Google Kubernetes Engine.
You also found out that Compute Engine, which is an infrastructure-as-a-service, delivers virtual
machines via Google's data centers and global fiber network.

Next, you saw how auto-scaling controls, managed instance groups, and learned more about how
App Engine is a service that allows users to focus on writing code and not infrastructure.

88
(Refer Slide Time: 00:49)

You discovered more about Cloud Functions, a serverless option that connects cloud services
with event-driven functions that respond to cloud events. And finally, you found out that Google
Kubernetes Engine, otherwise known as GKE, is a managed environment for deploying
containerized apps.

89
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-25
Module introduction

Hi, I am Priyanka, and welcome to the module. Where do I store this stuff? In this module, you
will see how you can take advantage of managed storage and databases for cloud applications.
So far in this course, you have learned what GCP is, why you should have a solid platform
before beginning a GCP transfer, and how GCP supports the building of apps. In this module,
you will learn about storage and database options in GCP.

(Refer Slide Time: 00:28)

The objective of this module is for you to be able to implement a variety of structured and
unstructured storage models. To achieve this goal, you will need to meet the following learning
objectives. Discuss the different storage options that exist in the cloud. Differentiate between
structured and unstructured storage in the cloud, compare the role of the different cloud storage
options, and explore the use case for relational versus NoSQL storage.

90
(Refer Slide Time: 00:56)

You will also describe leveraging cloud storage as unstructured storage and explained relational
database options in the cloud and describe the NoSQL options in GCP.

(Refer Slide Time: 01:07)

These are the topics that make up the module. First, you will learn about the different storage
options available in the cloud. Then you will see the difference between structured and
unstructured storage and how you can leverage unstructured storage using cloud storage. A

91
hands-on lab will allow you to explore a range of activities applicable to cloud. You will then
review the use case for SQL managed services. Next, you will explore cloud SQL.

And then, complete another lab where you will import data into cloud SQL and perform basic
data analysis. You will discover how to leverage Cloud Spanner and explore the available
NoSQL options. You will then learn how to use Cloud Datastore as a NoSQL document store.
You will complete the module with an app development activity that requires you to store app
data in Cloud Datastore, and then you will learn how Cloud BigTable can be leveraged as a
NoSQL option. The module will end with a short quiz and a recap of the key learning points
covered in the module.

92
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-26
Storage Options in the Cloud

Let’s start the first topic, the different storage options that exist in cloud. GCP offers many
different storage options from object stores to database services. These options how to save costs
and reduce the time it takes to launch and make most of the data set by being able to analyse of a
wide variety of data.

(Refer Slide Time: 00:21)

All applications create and use data. We provide all tools you need to build and move
applications to the cloud. This includes storing videos, images, and other objects, even
operational application data. Applications, in many cases, required storage solutions, and GCP
provides managed services that are scalable, reliable, and easy to operate. For relational
databases, which are commonly used today, offer Cloud SQL for MySQL in PostgreSQL as well
as Cloud Spanner.

93
We also have a non-relational or NoSQL database like Cloud Datastore and Cloud Bigtable. Big
Query is a highly scalable enterprise data warehouse and works outside the storage solutions to
discuss this module.

(Refer Slide Time: 01:09)

There are three common use cases for cloud storage. The first is content storage and delivery.
This is where you have content such as images or videos when you serve the content to the users
wherever they are. People wanted their content fast running under the Global network the
Google provide and ensure positive experiences for users. The second use case is storage and
data analytics and general compute.

You can process or expose your data to analytics tools like the analytics stack of the product that
GCP offers and you things like genomics sequencing for the internet of things data analysis.
Third use case is backup and archival storage. You can save storage cost by migrating
infrequently accessed contents to cheaper cloud storage options. It’s also critical to have a copy
in the cloud for recovery purposes. Just in case anything happens to your data on-premises.

94
(Refer Slide Time: 02:04)

If you work with databases, we have two priorities for you. The first priority is to help you
migrate the existing database to the Cloud and move them to the right service. You will likely be
moving My SQL or PostgreSQL to Cloud SQL. The second priority is to help you innovate,
build a rebuild for the cloud take advantage of mobile, and plan for future growth.

95
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-27
Structured and Unstructured Storage in the Cloud

Now you see the difference between structured and unstructured storage and how to choose
between them in the cloud.

(Refer Slide Time: 00:08)

Structured data is what most people are used to working with and typically fits with the columns
and rows in spreadsheets and relational databases. You can expect this data to be organized and
clearly defined, and it is usually easy to capture and analyze. Examples of structured data
wouldn’t include name, address, contact number, date, and billing information. The benefit of
structured data where it can be understood by programming languages, and data can be
manipulated relatively quickly.

It is estimated that around 80% of all data is unstructured. It’s far more difficult to process or
analyze unstructured data using traditional methods as there’s no internal identifier to enable
search functions to identify it. Unstructured data often includes text and multimedia content for
example email, messages, documents, photos, videos, presentations, web pages, and so on.

96
Organizations are focusing increasingly on mining unstructured data for inside that provide them
the competitive edge.

(Refer Slide Time: 01:18)

This flowchart shows the decision tree for determining based on the use case, which stored
solution would be utilized. If you need a solution to hold files, backups, logs, blobs, for example,
a good unstructured solution would be Cloud Storage. If you need a structured solution and
utilizing workload analytics Cloud BigQuery and Cloud Bigtable are two options. If a relational
database is needed, then you can choose either a traditional manage MySQL or PostgreSQL
database using CloudSQL or horizontally scalable highly available databases like Cloud
Spanner.

And if you need a simple NoSQL option to use for your application Cloud Datastore is a solid
choice. In the topic that follows you learn more about each option.

97
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-28
Unstructured Storage using Cloud Storage

(Refer Slide Time: 00:07)

In this topic, you will consider how you can leverage unstructured storage using Cloud Storage.
Cloud Storage is just one of the many storage options on GCP and store and search object data,
also known as blob data.

98
(Refer Slide Time: 00:18)

You can store an unlimited amount of objects in the cloud up to 5 terabytes in size each. Cloud
Storage is well-suited for binary or object data such as images media servings and backups.
Cloud Storage is the same storage that we use for images in Google Photos, Gmail attachments,
Google Docs, and so on. Users have a variety of storage requirements for a multitude of use
cases. To cater to these requirements, we offer different classes of cloud storage. The classes are
based on how often the data is accessed.

(Refer Slide Time: 00:50)

99
Multi-regional storage costs a bit more, but it is due redundant that means you pick a broad
geographical location like the United States, the European Union or Asia, and Cloud Storage
stores your data in at least two geographic locations separated by at least 160 kilometers. This
option is ideal for storing data that is frequently accessed around the world, such as serving
website content, streaming videos, or gaming and mobile applications.

Regional storage lets you store your data in a specific GCP region. For example, US central one,
Europe West one or Asia East one it’s cheaper than multi-regional storage but it offers less
redundancy. This option is ideal for data analytics and machine learning jobs. Nearline storage is
a low cost highly durable storage service for storing in frequently accessed data. This storage
class is a better choice than multi-regional storage or regional storage in scenarios where you
plan to read or modify your data on average once a month or less.

For example, if you want to continuously add files to cloud storage and plan to access these files
once a month for analysis Nearline storage is a great choice. Typical users for this storage class
includes long-tail content, multimedia source file storage, and online backups. Coldline storage
is a very low-cost, highly durable storage service for data archiving online backups and disaster
recovery. Coldline storage is the best choice for data that you plan to access at most once a year.

Due to this slightly lower availability, 90-day minimum storage duration, the cost for data
access, and higher pre-operational cost. Typical use cases include archive data, data with lengthy
storage durations from legal or regulatory requirements, tape migrations, and disaster recovery.
Cloud storage is unique in a number of ways. It has a single API, milliseconds data access
latency, and 11 nines durability across all storage classes.

Cloud Storage also offers object lifecycle management, which uses policies to automatically
move data to lower-cost storage classes as it is accessed less frequently throughout its life.

100
(Refer Slide Time: 03:22)

Cloud Storage files are organized into buckets. When you create a bucket, you give it a globally
unique name. You specify a geographic location where the bucket and its contents are stored,
and you choose one of the default storage classes that you were introduced to earlier. There are
several ways to control users’ access to your objects and buckets. For most purposes, cloud IAM
is sufficient. Rules are inherited from project to bucket to object.

If you need final control, you can create access control lists. ACL’s define who has access to
your bucket and objects as well as what level of access they have. Each ACL consists of two
pieces of information. A scope that defines who can perform the specified actions and
permission, which defines what action can be performed, for example, read or write. If you want,
you can turn on object versioning on your buckets.

Cloud Storage keeps a history of modifications that overrides or deletes for all objects in the
bucket. You can list the archived versions of an object, restore an object to an older state, or
permanently delete a version as needed. If you do not turn on object versioning new always
overrides old. Cloud Storage also offers lifecycle management policies. For example, you could
tell cloud storage to delete objects older than 365 days or to delete objects created before January

101
1st, 2013, or to keep only the three most recent versions of each object in a bucket that has
versioning enabled.

102
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-29
SQL Managed Services

In this topic, we will discuss the use case for SQL managed services, but before you do revisit
what database and how it is used.

(Refer Slide Time: 00:10)

A database is a collection of information that is organized so that it can easily be accessed and
managed. Users build software application using database is to answer business questions such
as buying a ticket, filling an expense report, storing a photo, or storing medical records.

103
(Refer Slide Time: 00:28)

Computer applications run databases to fast answer questions like what is this user's name given
their sign-in information so I can display it. What is the cost of the product? Why is I show it on
my dynamic work page? What were my top 10 best-selling products this month, or what is the
next ad I should show the user currently browsing my site? App must be able to write data in and
read data out from the databases. When a database is used, it is usually run by computer
applications.

So when it is said that the database is useful for X, it is usually because it is designed to make
answering a question simply fast and efficient for the app.

104
(Refer Slide Time: 01:14)

Relational database management system abbreviated RDBMS or just relational database are used
extensively the kind of databases encountered most of the time. They are organized based on the
relational model of data because they make use of the structure query language. Relational
databases are sometimes call SQL databases. Relational database is a very good when you have a
well structured data model and when you need transactions and ability to join data across tables
to retrieve complex combinations of your data.

(Refer Slide Time: 01:50)

105
GCP offers to manage relational database services. Cloud SQL is a managed MySQL or
PostgreSQL database. When setting up Cloud SQL with replicas, replication is automatic. The
infrastructure of this database is managed. This will include through backups, updates, spillovers,
and maintenance. You are still responsible for user’s schema and data management. The
database can scale horizontally for reading using replicas.

For writes, you have made the database larger, which usually requires to rebuild. Cloud Spanner
is strongly consistent, horizontally scalable, managed relational database. This service can run
your database across multiple nodes, either within the single region or multiple regions. The
managed services are automatically replicate data to the nodes. Spanner is ANSI SQL 2011
compliant with extensions.

106
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-30
Exploring Cloud SQL
(Refer Slide Time: 00:03)

Let’s now explore Cloud SQL in more detail. Cloud SQL is a fully-managed relational database
service that makes it easy to set up, maintain, manage, and administrational MySQL and
PostgreSQL databases in the cloud. This allows you to focus on your application. Could SQL is
perfect for word press sites, E-Commerce applications, CRM tools, Geo-spatial applications, and
any other application that is compatible with MySQL, PostgreSQL or SQL server.

107
(Refer Slide Time: 00:35)

Cloud SQL helps manage and maintain administrative tasks. SQL does not require any software
installation and is made all backups, replication, patches, and updates. This is cheap, with greater
than 99.95% availability anywhere in the world.

(Refer Slide Time: 00:53)

108
To care for workload with different performance demands, Cloud SQL scale up to 64 processor
cores and more than 400 gigabytes of RAM, for instance. Cloud SQL also provides you with up
to 10 terabytes of storage capacity.

(Refer Slide Time: 01:09)

For isolation from failure, high availability provides continuous health checking and
automatically fails over if an instance isn’t healthy. You can also easily configure the application
and backup to protect your data. Data is automatically encrypted, and Cloud SQL needs
applicable compliance requirements.

109
(Refer Slide Time: 01:30)

Cloud SQL instances are accessible from almost any application anywhere. You can also easily
connect from App Engine, Compute Engine, and your workstation. If your app with MySQL or
PostgreSQL, it will work with Cloud SQL. It is also easy to move and migrate data through
built-in migration tools and standard connection drivers such as MySQL workbench.

110
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-31
Cloud Spanner as a Managed Service

(Refer Slide Time: 00:06)

In this topic, you explore how Cloud Spanner can be leveraged as a managed service. As per
Cloud SQL Spanner is a line through relational database requirements. The key difference is that
Cloud Spanner combines the benefits of a relational database structure with a non-relational
horizontal scale. Vertical scaling is unique, and is a single instance is larger or smaller;
Horizontal scaling when scale by adding and removing servers.

What makes Clouds Spanner unique is the relational database that scales horizontally. Cloud
Spanner users often advertising finance and marketing technology industries with the need exist
to manage end-user metadata.

111
(Refer Slide Time: 00:46)

Most databases today require making tradeoffs between scale and consistencies. With Cloud
Spanner, you get the best of relational database structure and non-relational database scale on
performance with strong external consistency across roads, regions, and continents. This means
that Cloud Spanner can scale to very large database sizes while still giving IT and developers the
familiarity they use to with other relational databases such as MySQL, PostgreSQL, or
proprietary databases. The Cloud Spanner is strongly consistent. Data added or updated from any
location is immediately available regardless of the location it is accessed.

Cloud Spanner also dramatically reduces the operational overhead needed to keep the database
online and serving traffic. Users often move to Cloud Spanner from shared MySQL deployment
and expensive proprietary solutions.

112
(Refer Slide Time: 01:46)

Cloud Spanner scale horizontally and serves data with no latency while maintaining transaction
consistency and industry-leading five-nines availability. But it is less than five minutes
downtime for the year. Cloud Spanner can scale to arbitrary large database sizes to help avoid
rewrite and migrations. The use of multiple databases or shared databases as an alternative
solution introduces unnecessary complexity and cost.

Cloud Spanner allows you to create or scale of a globally replicated database from mission-
critical apps through a hand full of clicks. Synchronous replication and maintenance are also
automatic and built-in. Cloud Spanner is a relational database with full relational semantics
ACID transactions and as online scheme changes as an online operation with no planned
downtime. You can reuse existing SQL skills to query data in Cloud Spanner using familiar
industry-standard ANSI 2011 SQL.

Enterprise-grade security includes data layer encryption by default in transit and addressed
granular identity and access management and audit logging.

113
(Refer Slide Time: 03:04)

So, how does Cloud Spanner work? When data is automatically and instantly copied across
regions. This is called synchronous replication. As a result, the query is always returned,
consistent, and ordered answers regardless of the region. Google uses replication within and
across regions to achieve availability. So, if one region goes offline, your data can still be served
from another region.

114
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-32
No SQL Managed Services Options

(Refer Slide Time: 00:05)

Now, you will explore the NoSQL managed service options currently available. Google offers
two managed NoSQL database options. The first one is Cloud Datastore. This is a fully managed
service NoSQL document store that supports ACID transactions. The second one is Cloud
BigTable. This is a petabyte-scale, sparse, wide column NoSQL database that offers extremely
low write latency these managed services are discussed in more detail in topics that follow.

115
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-33
Cloud Datastore a NoSQL Document Store

(Refer Slide Time: 00:05)

In this topic, you will discover how Cloud Datastore can be used as a NoSQL document store.
Cloud Datastore is a highly scalable NoSQL database ideal for rapid and flexible web and
mobile development. Cloud Database is a schema-less database; this means that it does not rely
on schema the way a relational database does. Cloud Datastore is there for ideal if you have non-
relational data and want a serverless database without having to worry about nodes are cluster
management.

Cloud Datastore isn’t a full SQL database and isn’t an effective storage solution for data being
used for analysis.

116
(Refer Slide Time: 00:41)

Cloud Datastore allows you to change your data structure as your application evolves. So there is
no need to perfect your data model at the beginning of your project. With NoSQL storing new
properties in data requires no database changes or scheme. Cloud Datastore serves high speed
queries no matter how big your database to ensure that application maintains high performance.
Cloud Datastore uses Google query language or GQL which because it is a query language in a
SQL like a syntax format is both familiar and easy to learn.

Complex queries are enabled with secondary called built-in Cloud Datastore and composite
indexes. It automatically scales to support millions of API requests per second and hundreds of
terabytes of data. So no configuration or capacity planning is needed. Cloud Datastore is fully
managed by Google, so you can instantly provide a scalable and available NoSQL database
without a hassle of spinning virtual machines and maintaining databases.

Cloud Datastore automatically handle sharding and replication across multiple data centers to
provide a database that is highly available and durable. This allows users to focus on application
development. With the RESTful interface of Cloud Datastore, data can be easily accessed by any
deployment target. You can build solutions that run on App Engine, GKE, and Compute Engine
and use Cloud Datastore as their integration point.

117
Cloud Datastore automatically encrypts all data before it is written into the disk automatically
decrypts the data when read by the authorized user.

(Refer Slide Time: 02:31)

Cloud Datastore is the best choice for shifting data requirements without needing downtime,
such as user game profiles, where flexibility enables the rapid development of new features. It’s
also ideal for storing user profiles to deliver customized experience based vast activities and
preferences. Cloud Datastore enables true data hierarchy through the ancestor path. This means
that related data can be strongly grouped together and making exceptional for tasks such as
storing product reviews or storing an online product catalog that provides real-time inventory
and product details for a customer.

Cloud Datastore is well suited for recording transactions based on ACID properties. For
example, transferring funds from one bank account to another. For mobile games, Cloud
Datastore provides a durable key-value store that allows player data to be efficiently stored and
accessed. Its scalability accommodates the growth of games, whether there are ten players or a
hundred million.

118
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-34
Cloud Bigtable as a NoSQL Option

(Refer Slide Time: 00:07)

The final topic of this module describes how to leverage Cloud BigTable as a NoSQL option.
Cloud BigTable alliance with non-relational database requirements and is a high performance
and NoSQL database service for large analytical and throughput intensive operational workloads.
It is designed for a very large amount of data. It is great for the internet of things user analytics,
financial data analysis, time-series data, and graph data.

Cloud BigTable is also an option if support isn’t required for ACID transactions or if the data
isn’t in highly structured.

119
(Refer Slide Time: 00:40)

Cloud BigTable is the same database that powers many of our core services, including analytics,
search, maps, and Gmail. BigTable was an internal Google database system. It was so
revolutionary that it is kick-started the NoSQL industry. We wanted to build a database that
could deliver real-time access to petabytes of data. The result was BigTable, and in 2006 we
released a research paper describing it.

This was later awarded as been one of the most influential papers of the previous decade. This
gave people outside the Google ideas that led to the creation of popular NoSQL databases. In
2015, Cloud BigTable became available as a service. So you can use it for your own application.

120
(Refer Slide Time: 01:29)

Cloud BigTable offers high performance under high loads. For that reason, large apps and
workloads are faster, more reliable, and more efficient running on Cloud BigTable. Cloud
BigTable is ideal for storing a large amount of data with very low latency. Databases can
automatically and seamlessly scale to billions of rows of thousands of columns allowing you to
store petabytes of data.

Changes to the deployment configuration are immediate. So there is no downtime during


reconfiguration replication as high availability for live serving apps and workload isolations for
serving versus analytics. Because BigTable is a fully managed service, there is no need to worry
about configuring and tuning your database for performance or scalability. They also create data
backups to protect against catastrophic events and allow for disaster recovery. You can use
Cloud BigTable for a range of applications from real-time ad analytics to tracking millions of
readings from thousands of internet of things sensors.

Because Cloud BigTable is compatible with industry-standard supports like HBase, Hadoop, Big
Query, and Cloud Dataflow. It’s easy to put all the data that works for your apps. In terms of

121
security, all data in Cloud BigTable is encrypted both in-flight and at rest while access to Cloud
BigTable data is easily controlled through Cloud IAM permissions.

(Refer Slide Time: 03:03)

As Cloud BigTable is part of GCP ecosystem, it can interact with other GCP services and third-
party clients. From an application API perspective, data can be read from and written to Cloud
BigTable through data service layer like manage VM’s, the HBase REST server, or a java server
using HBase clients. Typically, this will be to serve data to application dashboards and data
services. Data can also be streamed in through a variety of popular streams processing
frameworks like Cloud data flow streaming, Spark streaming, and Strom.

If streaming is not an option, data can also be read from and written to Cloud BigTable through
batch processing like Hadoop Map Reduce, Cloud Dataflow, or Spark. Often summarized on
newly calculated data is written back to Cloud BigTable or a downstream database.

122
(Refer Slide Time: 04:00)

This diagram shows a simplified architecture of Cloud BigTable. It illustrates the processing,
which is done through a front end, server pool, and nodes. It handles separately from the storage.
A Cloud BigTable table is sharded into blocks of continuous rows called tablets to balance the
query workloads. Tablets are similar to HBase regions. Tablets are stored in colossus, which is
Google's file system. In a sharded a string table or SS table format. And SS table provides a
persistent ordered, immutable map from the keys to values where both keys and values are
arbitrarily by straights.

123
(Refer Slide Time: 04:43)

This chart shows that as the required query per second increases, the nodes required will increase
too. The throughput scales linearly so for every single node that you add. You are going to see a
linear scale of throughput performance up to hundreds of nodes.

124
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-35
Summary

(Refer Slide Time: 00:05)

That concludes the module where do I store this stuff. Here is a reminder of what you learned.
You began by learning that there are three common use cases for cloud storage, including
content storage and delivery, storage for data analytics in general compute, and backup an
archival storage. You then identified that Cloud Storage is suited for the storage of unstructured
data. Next, you discovered that three different Cloud Storage classes differ based on how often
the data is accessed.

They are Multi-regional, Regional, Nearline, and Coldline. You also learned that CloudSQL is a
fully managed relational database service that makes it easy to set up, maintain, manage, and
administer relational MySQL and PostgreSQL databases in the cloud. And you identified that
Cloud Spanner combines the benefits of relational database structure with a non-relational
horizontal scale.

125
(Refer Slide Time: 00:59)

Next, you discovered that Cloud Datastore is ideal for rapid and flexible web and mobile
development because it’s highly scalable. And finally, you learned that Cloud BigTable aligns
with non-relational database requirements and is a high-performance NoSQL database service
for large analytical and throughput intensive operational workloads.

126
Google Cloud Computing Foundation Course
Jimmy Tran
SMB Growth Program Manager
Google Cloud

Lecture-36
Module Introduction

Hi, I am Jimmy. Welcome to the module, there is an API for that. In this module, you will look
at how to build applications with managed services. So far in this course, you have learned what
GCP is and why you should have a solid platform before beginning a GCP transfer. You have
also learned how GCP supports the billing of apps and the different storage and database options
in GCP. Now, you’ll learn about REST APIs the services that are available on how to manage
your API's and how Cloud Pub/Sub works.

(Refer Slide Time: 00:37)

The objective of this module is to review the different application managed service options in the
cloud. To achieve this goal, you’ll need to meet the following learning objectives. Discuss the
purpose of APIs, explain the format of a REST API, compare and contrast the Cloud endpoints,
and Apigee API managed services. Identify the use case for a managed messaging service and
discuss how Cloud Pub/Sub is used as a managed messaging service.

127
(Refer Slide Time: 01:12)

These are the topics that make up the module. You begin by learning about the purpose of APIs
and discuss REST API specifically, which are currently the most popular style for services. You
will explore Cloud endpoints, which is a distributed API management system. Before completing
a hands-on lab where you’ll deploy a sample API with Cloud endpoints. The next topic looks at
the Apigee Edge, which is a platform for developing and managing API proxies.

You’ll then learn about the use case for managed services. Followed by Cloud Pub/Sub, which is
Google's managed message system. Next, in this lab, you will publish messages with Cloud
Pub/Sub using the Python client library. You will end with a short quiz and a quick recap of all
the topics you have learned, which will take you to the end of this module.

128
Google Cloud Computing Foundation Course
Jimmy Tran
SMB Growth Program Manager
Google Cloud

Lecture-37
The Purpose of APIs

(Refer Slide Time: 00:04)

Let’s start by discussing the purpose of APIs. An API is a software structure written that presents
a clean, well-defined interface that abstracts away unnecessary detail. APIs are used to simplify
how different disparate software resources communicate by using a universal structure of
communications. We open up a wide range of opportunities.

129
(Refer Slide Time: 00:26)

Representational state transfer or REST is currently the most popular architectural style for
surfaces. REST outlines a key set of constraints and agreements that the service must comply
with if a service complies with these REST constraints, it is said to be RESTful. The web is
HTTP-based and provides an architectural structure that scales well and stands the test of time.
REST transfers the ideas that work so well for the web and applies them to services.

APIs intended to be spread widely to consumers and deployed to devices with limited computing
resources like mobile are well-suited to a REST structure. REST APIs use HTTP requests to
perform GET, PUT, POST, and DELETE operations. Having different software services
leverage a universal communication channel ensures applications can get updated or rewritten
and still be able to work with other applications as long as they conform to the agreed-upon API
standard.

What are the main reasons REST APIs work well with the cloud due to their stateless nature?
State information does not need to be stored or referenced for the API to run. Authentication can
be done through OAuth, and security can be used by leveraging tokens.

130
(Refer Slide Time: 01:59)

When deploying and managing APIs on your own, there are a number of issues to consider. For
example, the language or format you will use to describe the interface- how you will authenticate
services and users who invoke your API? How you will ensure that your API scales to meet
demand and whether your infrastructure log details API invocations and provides monitoring
metrics.

131
Google Cloud Computing Foundation Course
Jimmy Tran
SMB Growth Program Manager
Google Cloud

Lecture-38
Cloud Endpoints

(Refer Slide Time: 00:10)

In this topic, you explore Cloud Endpoints, a way to develop, deploy, and manage APIs on any
Google Cloud backend. Cloud Endpoint is a distributed API management system. With Cloud
Endpoints, you can control who has access to your API. You can generate API keys in the GCP
console and validate on every API call and share your API with other developers to allow them
to generate their own keys. You can also validate calls with JSON Web Tokens.

The integration with Auth0 and Firebase authentication allows you to identify the users of your
web or mobile application.

132
(Refer Slide Time: 00:46)

The extensible service proxy delivers security and insight in less than one millisecond per call.
You can automatically deploy your API with App Engine and Google Kubernetes Engine or add
Google's proxy container to your Kubernetes deployment.

(Refer Slide Time: 01:05)

133
You can monitor critical operations metrics in the GCP console, such as error rates and latency,
and gain insights into your users and usage with stack driver trace and stack driver logging. You
can use BigQuery to perform further analysis.

(Refer Slide Time: 01:24)

You can get started quickly by using your favorite API framework and language or choose our
open-source Cloud Endpoints frameworks in Java or Python. You can also upload an open API
specification and deploy our containerized proxy.

134
(Refer Slide Time: 01:43)

Cloud Endpoints supports applications running in GCP’s Compute platforms in your choice of
language and client technologies choice. It allows you to establish a standardized API for mobile
or web client applications to enable them to connect to and use a backend application on App
Engine. It also provides mobile or web application access to the full resources of App Engine.

(Refer Slide Time: 02:13)

In the previous topic, we discuss how it can be difficult to deploy and manage APIs on your own.
This graphic summarizes how Cloud Endpoint provides the infrastructure support needed to

135
deploy and manage robust, secure, and scalable APIs. Cloud Endpoint supports the open API
specification and GRPC API specification. Cloud Endpoint also supports service to service
authentication and user authentication with Firebase, Auth0, and Google authentication. The
extensible service proxy service management and service control together validate requests log
data and handle high traffic volumes.

Logging and trace allow you to view detailed logs trace lists and metrics related to traffic
volume, latency, size of requests, and responses, and errors.

136
Google Cloud Computing Foundation Course
Jimmy Tran
SMB Growth Program Manager
Google Cloud

Lecture-39
Using Apigee Edge

(Refer Slide Time: 00:08)

In this topic, you can learn about Apigee Edge, another platform for managing APIs. Apigee
Edge allows you to front your services with a proxy layer. An API proxy is your interface to
developers that want to use your backend services. Rather than having them consume those
services directly, they act as an Edge API proxy that you create. With the proxy, you can provide
value-added features such as security, rate limiting, quotas, caching and persistence, analytics,
transformations, fault handling, and much more.

Many users of Apigee Edge are provided a software service to other companies, and those
features come in handy. Because the backend services for Apigee Edge need not be in GCP,
engineers often use it when working to take a legacy application apart. Instead of replacing a
monolithic application in one risky move, they can instead use Apigee Edge to peel off its
services one-by-one. Standing up microservices to implement each intern until the legacy
application can finally be retired.

137
(Refer Slide Time: 01:17)

An API gateway creates a layer of abstraction and insulates the clients from the partitioning of
the application into microservices. You can use Cloud Endpoints to implement API gateways.
Additionally, the API for your application can run on backends such as App Engine, Google
Kubernetes Engine, or Compute Engine. If you have legacy applications that cannot be
refactored and moved to the cloud, consider implementing APIs as a facade or adapter layer.

Each consumer can then invoke these modern APIs to retrieve information from the backend
instead of implementing functionality to communicate using outdated protocols and disparate
interfaces.

138
Google Cloud Computing Foundation Course
Jimmy Tran
SMB Growth Program Manager
Google Cloud

Lecture-40
Managed Message Services

(Refer Slide Time: 00:05)

Next, let’s look at the use cases for manage message services. Across industry verticals, a
common scenario is that organizations have to rapidly ingest, transform, and analyze massive
amounts of data. For example, a gaming application might receive and process user engagement
and clickstream data. In the shipping industry, the Internet of things applications might receive
large amounts of sensor data from 100’s of sensors. Data processing applications transform the
ingested data and save it in an analytics database. You can then analyze the data to provide
business insights and create innovative user experiences.

139
(Refer Slide Time: 00:45)

Organizations often have complex business processes that require many applications to interact
with each other. For example, when a user plays a song, a music streaming service must perform
many operations in the background. There might be operations to pay the record company,
perform live updates to the catalog, update song recommendations, handle add interaction
events, and perform analytics on user actions. Such complex application interactions are difficult
to manage with brittle point-to-point application connections.

(Refer Slide Time: 01:28)

140
There are many different reasons why a managed messaging system might be used balancing
workloads and network clusters. For example, a large queue of tasks can be efficiently
distributed among multiple workers, such as Compute Engine instances. Implementing
asynchronous workflows, an order processing application can place an order on a topic from
which it can be processed by one or more workers.

(Refer Slide Time: 01:55)

Distributing event notifications, a service that accepts user sign-ups can send notifications
whenever a new user registers. And downstream services can subscribe to receive notifications
of the event.

141
(Refer Slide Time: 02:10)

Refreshing distributed caches, an application can publish invalidation events to update the IDs of
objects that have changed.

(Refer Slide Time: 02:21)

Logging to multiple systems, a Compute Engine instance can write logs to the monitoring system
to a database for later querying and so on.

142
(Refer Slide Time: 02:32)

Data streaming from various processes or devices require a residential sensor to stream data to
back-end servers hosted in the cloud.

(Refer Slide Time: 02:42)

Reliability improvement a single zone Compute Engine service can operate in additional zones
by subscribing to a common topic to recover from failures in a zone or region.

143
Google Cloud Computing Foundation Course
Jimmy Tran
SMB Growth Program Manager
Google Cloud

Lecture-41
Cloud Pub/Sub

Cloud Pub/Sub is our own managed message service, and you will learn about this next. Cloud
Pub/Sub is a real-time messaging service that allows you to capture data and rapidly pass
massive amounts of messages between other GCP services and other software applications.
Think of it as a connector that removes the time that you would typically spend managing
operations.

(Refer Slide Time: 00:24)

Well, in the primary use cases for inter-app messaging is to ingest streaming event data. Cloud
Pub/Sub allows you to make your events accessible through messaging middleware. Cloud
Pub/Sub will reliably deliver each event to the services that must react to it. Upon event
publications, a Cloud Pub/Sub push subscriptions deliver the events as serverless apps running in
Cloud Functions, App Engine, or Cloud run.

144
Pull subscriptions make it available to more complex stateful services running in Google
Kubernetes Engine or Cloud Dataflow. And multi-region environments operate seamlessly
because of the global nature of Cloud Pub/Sub. Cloud Pub/Sub lets you focus on application
logic regardless of location or scale. The service is minimal and easy to start with and eliminates
the operational, scaling, compliance, and security surprises and then evenly reveals themselves in
software projects. Always on features include end-to-end encryption, Identity and Access
Management, and audit logging.

It also includes NoOps, fully automated scaling, and provisioning with virtually unlimited
throughput. Further features of Cloud Pub/Sub include extreme data durability and availability
with synchronous cross-zone replication as well as native client libraries in major languages and
an open service API.

(Refer Slide Time: 01:56)

Cloud Pub/Sub is called middleware because it is positioned between applications. It is used


between data gathering and processing systems. For example, if an organization is hiring a new
employee, the company's HR system can use Cloud Pub/Sub to notify their other business
systems that a new employee has been hired, pass on relevant information and initiate actions.
Cloud Pub/Sub is often found in the middle of systems like this.

145
(Refer Slide Time: 02:32)

Publisher applications can send messages to a topic and subscriber applications can subscribe to
that topic to receive the message when the subscriber is ready. This can take place
asynchronously. It is important to understand that subscribers only receive messages from the
initial publisher. It’s best practice when using Cloud Pub/Sub with GCP tools to specify a
subscription instead of a topic for reading.

(Refer Slide Time: 03:02)

146
Cloud Pub/Sub acts as a buffer between sending and receiving across software applications,
which makes it easier for developers to connect applications. For example, Cloud Pub/Sub can
be used to guarantee the email messages get delivered swiftly to online users as well as offline
users when they come back online. Cloud Pub/Sub can act as a shock absorber within the
architecture. If there is a sudden influx of messages, Cloud Pub/Sub avoids the risk of
overwhelming the consumers of those messages because it will absorb the sudden increase in
messages. The consumers can continue to pull as many messages as they can handle at once.
Messages can be pushed to any secure web server or pulled from anywhere on the internet.

(Refer Slide Time: 03:51)

This topology represents a slightly more complex setup of Cloud Pub/Sub. Note that everything
in the green box is part of the Cloud Pub/Sub managed service. You would supply to publishers
and subscribers by writing applications or leveraging other managed services.

147
(Refer Slide Time: 04:12)

Within the common big data processing model, Cloud Pub/Sub is found in the ingest phase.
Let’s explore the rest of this diagram. The first step in processing data is capturing and bringing
it into the system. GCP has several tools to help with this, including Cloud IoT and Cloud
Pub/Sub. Cloud Pub/Sub ingests event streams from anywhere at any scale for simple, reliable,
real-time stream analytics. The second step is to process the data. GCP tools during this stage
include Cloud Dataproc and Cloud Dataflow. The third step is to store the data and ensure the
right accessibility needed. GCP tools in this stage include Cloud Storage, Cloud SQL, Cloud
Spanner, and Cloud BigTable. Finally, users like you are looking to analyze the data to capture
insights. GCP products for this stage include BigQuery, Cloud BigTable, AI Platform, and Cloud
Dataproc.

148
(Refer Slide Time: 05:26)

There are many examples of Cloud Pub/Sub working. Every time your Gmail displays a new
message, it is because of a push notification to your browser or mobile device.

(Refer Slide Time: 05:38)

The updating of search results as you type is a feed of real-time indexing that depends on Cloud
Pub/Sub to update caches with breaking news.

149
(Refer Slide Time: 05:49)

Within the most important real-time information streams for some companies is advertising
revenue. They can use Cloud Pub/Sub to broadcast budgets to their entire fleet of search engines.

150
Google Cloud Computing Foundation Course
Jimmy Tran
SMB Growth Program Manager
Google Cloud

Lecture-42
Summary

(Refer Slide Time: 00:06)

That concludes the module there is an API for that. Here is a reminder of what you have learned.
You began by learning that APIs software structures were written to present clean, well-defined
interfaces that remove needless detail. Next, you discover that using REST APIs ensures that
legacy and newly created apps can communicate clearly with each other. When designing and
developing your APIs, you also found out that you need to make a range of considerations that
include the interface, authorization and authentication, management and scalability, and logging
and monitoring.

The next part of the module focused on different tools that you can use to manage your APIs.
This included Cloud Endpoints, which is a distributed API management system that helps you
create and maintain APIs. An Apigee Edge a platform for developing and managing API proxies.
It wraps around the APIs and has a business focus. In other words, Rate limiting, quotas
analytics, and so on.

151
(Refer Slide Time: 01:14)

Next, you learned about managed messaging systems. They ingest, transform, and analyze
massive amounts of data. They also orchestrate complex business processes. Finally, you learn to
Cloud Pub/Sub passes messages between data gathering and processing systems. Cloud Pub/Sub
is a service to help customers capture data and rapidly pass massive amounts of messages
between other Google Cloud Platform, Big Data tools, and other software applications with
world-class security.

152
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-43
Module introduction

Hi there, I am Seth, and welcome to you cannot secure the cloud right. You have reached module
6 in the Google Cloud Computing Foundations Course. So, far in this course, you have learned
about cloud computing and its importance. You have explored how to leverage Google Cloud to
build applications, and you have examined storage options and APIs. In this module, you will
learn about Google Cloud security best practices.

(Refer Slide Time: 00:27)

The objective of this module is to outline how security in the cloud is administered in GCP. To
do that, you will need to meet the following learning objectives. You will learn how to describe
the shared security model of the cloud, discuss Google's security responsibilities versus your
security responsibilities, and explore the different encryption options available on GCP.

153
(Refer Slide Time: 00:52)

You will also identify best practices when configuring authentication and authorization using
cloud IAM.

(Refer Slide Time: 01:00)

These are the topics that make up the module you will start by considering what security in the
cloud actually means, followed by an overview of the shared security model. Here you will learn
about Google's security responsibilities and your security responsibilities as a cloud consumer.
Then you will move on to exploring encryption options an understanding authentication and

154
authorization with Cloud IAM. You will explore a cloud identity-aware proxy and then complete
a lab where you will set up user authentication using Cloud IAP.

You will then identify best practices for authorization using Cloud IAM. At the end of this, this
module concludes with a short quiz and a recap of the main learning points.

155
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-44
Introduction to Security in the Cloud

(Refer Slide Time: 00:04)

Let’s start with the first topic, an introduction to security in the cloud. At Google, we believe that
security empowers innovation. If you put security first, everything else will follow.
(Refer Slide Time: 00:13)

156
Google has been operating securely in the cloud for nearly 20 years. We have seven services,
each with over 1 billion users every day. This means that Google and Google Cloud connect to
more than a billion IP addresses every day. Designing for security is pervasive throughout our
entire infrastructure, and security is always paramount. Countless organizations have lost data
due to a security incident. A single breach could cost millions in fines and lost business. But
more importantly, a serious data breach can permanently damage an organization's reputation
with the loss of customer trust.

As a result, security is increasingly top of mind for organizational leadership like CEOs and
CSO’s. Unfortunately, many organizations don’t have access to the resources they need to
implement state-of-the-art security controls and techniques. Google invests heavily in its
technical infrastructure and has dedicated engineers tasked with providing a secure and robust
platform.

(Refer Slide Time: 01:25)

By choosing GCP, you can leverage that same infrastructure to help secure your services and
data through the entire information processing lifecycle, including the deployment of services,
data storage, communication between services, and operation by administrators.

157
(Refer Slide Time: 00:04)

Security cannot be an afterthought it must be fundamental in all designs that’s why at Google,
we build security through progressive layers that deliver true defense-in-depth, meaning our
cloud infrastructure does not rely on one single technology to make it secure.

(Refer Slide Time: 02:04)

Let’s start by talking about securing low level infrastructure. We design and build our own data
centers that incorporate multiple layers of physical security protections. As just one example,
access to these data centers is limited to a very small fraction of Google employees. We design
our own servers, networking equipment, and hardware security chips in those data centers. Our

158
servers use cryptographic signatures to make sure they are booting the correct software at the
correct version in the correct data center.

(Refer Slide Time: 02:42)

Now, let’s talk about a different layer of the stack service deployments. Specifically, let’s talk
about Google service deployments, which provide the fabric for Google Cloud. When services
communicate with one another, they do so via a remote procedure call or RPC. If you are not
familiar with RPC, that is okay. It is just a way to facilitate communication between two services
like REST over HTTP or XML over SOAP.

Google's infrastructure provides cryptographic privacy and integrity for all RPC calls for service
to service communication. In addition to our own security practices, Google also has an external
bug bounty program in which third-party security researchers and developers can gain monetary
rewards for finding vulnerabilities in Google's software components.

159
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-45
Understanding the Shared Security Model

(Refer Slide Time: 00:08)

Now that you have an overview of Google's security model. You will learn about your security
responsibilities in the cloud. When you build and deploy an application on your on-premise
infrastructure, you are responsible for the security of the entire stack from the physical security
of the hardware and the premises in which they are housed all the way through the encryption of
the data on disk, the integrity of the network and then all the way up to securing the content
stored in those applications.

But when you move an application to Google Cloud. Google handles many lower-level security
layers like physical security, disk encryption, and networking integrity. The upper layers of the
security stack, including securing data, remain your responsibility. We provide tools like the
resource hierarchy and Cloud IAM to help you define and implement policies, but ultimately this
part is your responsibility.

160
(Refer Slide Time: 00:58)

Data access is almost always your responsibility. Simply put, your control who or what has
access to your data at any time. Google Cloud provides tools that help you control this access,
such as Cloud Identity and Access Management, but they must be properly configured to protect
your data.

161
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-46
Explore encryption options

(Refer Slide Time: 00:05)

You’ll now explore the options that Google Cloud offers for encrypting your data. There are
several encryption options available on Google Cloud. These range from simple but with limited
to control to greater control and flexibility but with more complexity. The simplest option is
GCP default encryption, followed by customer-managed encryption keys or CMEK, and then the
option that provides the most control our customers supplied encryption keys or CSEK.

A fourth option is to encrypt the data locally before you store it in the cloud. This is often called
client-side encryption.

162
(Refer Slide Time: 00:39)

By default, GCP will encrypt data In-transit and at REST. Data in transit is encrypted via TLS,
and data is encrypted at REST with an AES 256-bit key. The encryption automatically happens.

(Refer Slide Time: 00:54)

With customer-managed encryption keys or CMEK, you manage your own encryption keys that
protect data on Google Cloud. Google Cloud key management service or Cloud KMS automates
and simplifies the generation and management of encryption keys. The keys are managed by

163
you, the customer, but the keys never leave Google Cloud. Cloud KMS supports encryption,
decryption, signing, and verifying of data.

It supports both symmetric and asymmetric cryptographic keys and a variety of popular
algorithms. Cloud KMS allows you to rotate keys manually and automate the rotation of keys on
a time-based interval. Cloud KMS also supports both symmetric and asymmetric keys for
encryption and signing.

(Refer Slide Time: 01:45)

Customer supplied encryption keys, or CSEK gives you more control over your keys, both with
greater management complexity. With CSEK, you use your own AES 256-bit encryption keys.
You are responsible for generating these keys. You are also responsible for storing these keys
and providing them as part of your GCP API calls. Google Cloud will use the provided the key
to encrypt the data before persisting it. We guarantee that the key only ever exists in memory and
is immediately discarded after use.

164
(Refer Slide Time: 02:19)

Persistent disks such as those backing virtual machines can be encrypted with customer-supplied
encryption keys. With customer-supplied encryption keys for persistent disks, the data is
encrypted before it leaves your virtual machine. But even without CSEK or CMEK, your
persistent disks are still encrypted with Google’s default encryption. When a persistent disk is
deleted, the keys are discarded, rendering the data irrecoverable by traditional means.

(Refer Slide Time: 02:48)

165
For even more control over persistent disk encryption, you can create your own persistent disks
and redundantly encrypt them.

(Refer Slide Time: 02:58)

And finally, client-side encryption is always an option. With client-side encryption, you encrypt
data before sending it to Google Cloud. Neither the unencrypted data nor the decryption keys
ever leave your local device.

166
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-47
Understand authentication and authorization

In this next lesson, you will learn how to leverage authentication and authorization with Google
Cloud IAM to improve your infrastructure's security.

(Refer Slide Time: 00:11)

Cloud Identity and Access Management or Cloud IAM enable cloud administrators to authorize
who can do, what on, which resource in Google Cloud.

167
(Refer Slide Time: 00:21)

IAM policies can apply to many types of user like resources. The ‘who’ part of an IAM policy
can be a Google account or a cloud identity user, a service account, a Google Group, or an entire
G Suite or cloud identity domain.

(Refer Slide Time: 00:40)

Many users get started by logging into the GCP console with a personal Gmail account. To
collaborate with their teammates, they use Google Groups to gather together people who are in
the same role. This approach is easy to get started, but its disadvantages that teams’ identities

168
aren't centrally managed. For example, if someone leaves the organization or team, there is no
central way to remove their access to cloud resources immediately.

(Refer Slide Time: 01:07)

GCP users who are also G Suite users can define Google cloud policies in terms of G Suite users
and groups. This way, when someone leaves the organization, an administrator can immediately
disable their account using the Google cloud admin console for G Suite. GCP users who are not
G Suite users can gain these same capabilities through cloud identity. Cloud Identity allows users
and groups to be managed using the Google Cloud admin console, but the G Suite collaboration
products like Gmail, Docs, Drive, and calendar are not included. For this reason, Cloud Identity
is available for free.

169
(Refer Slide Time: 01:47)

But what if you already have centralized user management and identity system like Microsoft
Active Directory or LDAP, well Google’s Cloud Directory Sync can help. This tool
synchronizes users and groups from an existing active directory or LDAP system. Mapping the
users and groups in a Cloud Identity domain. This synchronization is only one way, though.
Cloud Directory Sync cannot modify information in Microsoft Active Directory or LDAP
systems. Cloud Directory Sync is usually scheduled to run without supervision on a fixed
interval like every 24 hours.

(Refer Slide Time: 02:24)

170
We’ve mentioned cloud identity a few times now. Let’s dive into a little more detail. Cloud
Identity is unified identity access and device management platform. Cloud Identity is an identity-
as-a-service solution. It is a service for managing user groups and security settings. Cloud
Identity can be used as a central source for domain-wide settings. Cloud Identity is associated
with a unique public domain. It can work with any domain name that is enabled for receiving
email messages.

(Refer Slide Time: 02:58)

Now that we’ve talked about the ‘who’. Let’s discuss the ‘can do what’ part of IAM. The ‘can do
what’ part is defined by an IAM role, which is a collection of IAM permissions. Permissions are
very low-level and fine-grained. For example, to manage a virtual machine, you need
permissions to create, delete, stop, start, and change an instance. To make this process, easier
permissions are often grouped together into an IAM role to make them easier to manage.

There are built-in roles available to all GCP users, and you can also build your own and
customize your own roles for your organization.

171
(Refer Slide Time: 03:43)

Finally, let’s discuss the ‘on which resource’ part of IAM. When you give a user group or service
account permissions on a specific element of the resource hierarchy, the resulting policy applies
to the element you choose as well as the elements below that resource in the hierarchy.

(Refer Slide Time: 04:02)

There are three kinds of roles in Cloud IAM- Primitive, Predefined, and Custom. Let’s talk about
each of them in turn.

172
(Refer Slide Time: 04:11)

IAM primitive roles apply across all GCP resources in a project. These primitive roles include
Owner, Editor, Viewer, and Billing admin. If you are a Viewer, you can examine resources, but
you cannot change their state. If you are an Editor, you can do everything a Viewer can do plus
modify state, and if you are an Owner, you can do everything in Editor can do plus manage roles
and permissions. The Owner role of a project also gives you control of our billing and cost
management functionality.

Often organizations want someone to be able to control the billing for a project without the right
to change the resources in that project. You can grant someone the Billing Administrator role,
which grants access to billing information but does not grant access to resources inside the
project.

173
(Refer Slide Time: 05:07)

IAM predefined roles apply to a particular GCP service in a project. GCP services offer their
own set of predefined roles, and they define where those roles can be applied. For example,
Google Compute Engine offers a set of predefined roles, and you can apply them to Compute
Engine resources in a given project, a given folder, or the entire organization. Another example
is Cloud BigTable, which is a managed database service. Cloud BigTable offers roles that can
apply across an entire organization to a particular project or even individual Cloud BigTable
database instances.

(Refer Slide Time: 05:47)

174
IAM predefined roles offer more fine-grained permissions on particular services. The Google
Compute Engine instance admin role allows whoever has it to perform a certain set of actions on
virtual machines. In this example, all the users of a certain Google Group have the role, and they
have it on all virtual machines in project A. The last kind of IAM role is a custom role, for some
organizations, the primitive and predefined IAM roles may not offer enough granularity.

(Refer Slide Time: 06:22)

IAM custom roles allow you to create your own roles that are composed of very granular
permissions. In this example, we have defined a new custom role, named Instance Operator, that
allows users to start and stop instances but does not give them permission to delete or
reconfigure them. At this time, custom roles can only be applied at project and organization
levels. It’s not currently possible to apply for custom roles at the folder level.

175
(Refer Slide Time: 06:49)

Another important concept related to identity and access management is Service accounts.
Service accounts control service to service communication. In order for services to interact with
each other, they need some kind of identity. Service accounts are used to authenticate service to
service communication. With service accounts, you can give a role level access from one service
to another. Suppose you have an application running in a virtual machine that needs to access
data in cloud storage.

You only want that virtual machine to have access to that data. You can create a service account
that is authorized to access that data in Cloud Storage and then attach that service account to the
virtual machine. Service accounts are named with an email address, often ending in G service
account dot com.

176
(Refer Slide Time: 07:44)

In this example, a service account has been granting the instance admin role. This would allow
an application running in a virtual machine with that service account to create, modify, and
delete other virtual machines. Incidentally, service accounts need to be managed too. For
example, maybe Alice needs to manage what can act as a given service account while Bob needs
to view what a particular service account can do. Fortunately, in addition to being an identity, a
service account is also a resource.

So, it can have IAM policies of its own attached to it. For instance, Alice can have the Editor role
on a service account, and Bob can have the Viewer role this is just like granting roles for any
other GCP resource like a virtual machine.

177
(Refer Slide Time: 08:33)

You can grant virtual machines with different identities. This makes it easier to manage different
project permissions across your applications. You can also manage the permissions of the service
accounts without having to recreate the virtual machines. Here is a more complex scenario. Say,
you have an application that is implemented across a group of virtual machines. One component
of your application requires the Editor role on another project, project B, but another component
doesn’t need any permission on project B.

You would want to create two different service accounts, one for each subgroup of virtual
machines. In this example, VM is running in component_1 are granted Editor access to project B
by using Service Account 1. Virtual machines running component_2 are granted object Viewer
access to bucket_1 using Service Account 2. Service account permissions can be changed
without recreating the VMs.

178
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-48
Identify best practices for authorization

(Refer Slide Time: 00:06)

In this lesson, you will learn the best practices for authorization using Cloud IAM. First, leverage
and understand the resource hierarchy specifically, use projects to group resources that share the
same trust boundary. Check the policy granted on each resource and make sure you recognize the
inheritance. Because of inheritance, always use the principle of least privilege when granting
roles audit policies using cloud audit logs and audit memberships of groups used in policies.

179
(Refer Slide Time: 00:34)

When it comes to using service accounts, here are a few best practices. Be very careful when
granting the service account users' role as it provides access to all the resources for which the
service account has access. Also, when you create a service account, give it a display name that
clearly identifies its purpose. Ideally, using an established naming convention inside your
organization and when it comes to service account keys establish key rotation policies and
methods.

180
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-49
Quiz

(Refer Slide Time: 00:03)

It is now time to test your knowledge with a short quiz. True or False. If a Cloud IAM policy
gives you owner permissions at the project level, your access to a resource in the project may be
restricted by a more restrictive policy on that resource. Policies are a union of the parent and the
resource. If a parent policy is less restrictive, it overrides a more restrictive resource policy.

181
(Refer Slide Time: 00:25)

How are user identities created in Cloud IAM? Creating users and groups within GCP is not
possible.

182
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-50
Summary

(Refer Slide Time: 00:07)

That concludes the module. You can secure the cloud right. Let me remind you of what you
learned. In this module, you started by learning how security in the Cloud is administered by
GCP. You thought about the importance of security in the Cloud and how it shapes the use and
development of Google Cloud Platform. You also determined the security responsibilities shared
between Google, who is responsible for managing its infrastructure security, and you, the
customer responsible for securing your data.

And you also discover that there are various encryption options available, including Google
default encryption CMEK, CSEK, and client-side encryption. Next, you learn how cloud Identity
and Access Management can control who can do what on which resource in GCP.

183
(Refer Slide Time: 00:52)

And you discovered Cloud Identity-aware Proxy, which lets you establish a central authorization
layer for applications accessed by TLS. So, you can use an application-level access control
model instead of relying on network-level firewalls. And finally, you are introduced to best
practices for authorization using IAM, which included leveraging and understanding the resource
hierarchy, granting roles to groups instead of individuals, and planning carefully about how to
use service accounts.

184
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-51
Module introduction

Hi, I am Priyanka and welcome to the module, It Helps to Network. In this module, you will be
learning about networking in the Google Cloud Platform. So far, in this course, you have learned
what GCP is, how GCP supports the building of apps, different storage options, the use of APIs,
and cloud security. In this module of the Google Cloud Computing Foundations Course, you find
out how networking in GCP works and what you need to consider before setting up those
networks.

(Refer Slide Time: 00:31)

The main objective of this module is to demonstrate how to build secure networks in the cloud.
To achieve this goal, you will need to meet the following learning objectives: Provide an
overview of networking in the cloud, discuss how to build virtual private clouds, explain the use
of public and private IP addresses, describe the Google Network including regions, zones, cache
nodes, point of presence, and fiber architecture and explore the role of firewall rules and routes.

185
(Refer Slide Time: 01:02)

You will also explore various hybrid cloud networking options and differentiate between load-
balancing options.

(Refer Slide Time: 01:09)

These are the topics that make up the module. First, you will be introduced to networking in the
cloud. Then you will learn what a virtual cloud is followed by an introduction to public and
private IP addresses. And a review of Google's network architecture, you will then learn about

186
routes and firewall rules in the cloud before completing a hands-on lab to discover the
fundamentals of VPC networking.

The next topic will explore how multiple VPC networks can be used, supported by two labs. In
the first, you will create VPC networks and VM instances, and in the other, you will create a web
server and explore Identity and Access Management roles and service accounts. You will then
learn how to build hybrid clouds using VPNs interconnecting and direct peering. Next, you think
about the different options for load-balancing followed by another two labs.

In the first lab, you will configure the HTTP load balancer and perform a stress test using cloud
Armor. In the second lab, you will configure and test an internal load balancer. The module will
end with a short quiz to help you check your understanding, followed by a recap of the key
learning points covered in the module.

187
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-52
Introduction to Networking in The Cloud

(Refer Slide Time: 00:05)

Let’s start with the first topic, an introduction to networking in the cloud. Computers
communicate with each other through a network. The computers in a single location like an
office are connected to a local area network. Multiple locations can have their LAN connected to
a wide area network, or WAN. Today, most networks are connected to the Internet, enabling
millions of personal computers, servers, smartphones, and other devices to communicate and
provide and consume IT services.

188
(Refer Slide Time: 00:33)

Since around 2004, Google has been building out the fastest, most powerful highest-quality
cloud infrastructure on the planet. Our high-quality private network connects regional locations
to more than 100 global points of presence close to users. GCP also uses state-of-the-art
software-defined networking and distributed systems technologies to host and deliver services
around the world. When every millisecond of latency counts, Google ensures that content is
delivered with the highest throughput.

189
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-53
Defining a Virtual Private Cloud

(Refer Slide Time: 00:05)

In this topic, you will explore what a virtual private cloud is. Virtual private cloud networks or
VPCs are used to build private networks on top of the larger Google network. With VPCs, you
can apply many of the same security and access control rules as if you were building a physical
network. VPCs allow the deployment of Infrastructure-as-a-Service resources such as compute
instances and containers.

They have no IP address ranges or global and span all available GCP regions. VPCs also
contains sub-networks that span all zones in a region and can have default, auto, or custom
modes. Sub-networks are also referred to as subnets. Subnets are regional resources. They must
be created in VPC networks to define sets of usable IP ranges for instances. VMs in different
zones within the same region can share the same subnet.

190
(Refer Slide Time: 01:00)

In this example, subnet1 is defined as 10.240.0.0/24 in the us-west region. Two VM instances in
the us-west1-a zone are in this subnet. Their IP addresses both come from the available range of
addresses in subnet1. Subnet2 is defined as 192.168.1.0/24 in the us-east1 region. Two VM
instances in the us-east1-a zone are in this subnet. Their IP addresses both come from the
available range of addresses in subnet 2.

Subnet 3 is defined as 10.2.0.0/16 also in the us-east1 region. One VM instance in the us-east1-a
zone and a second instance in the us-east1-b zone are in subnet 3. Each receiving an IP address
from its available range. Because subnets are regional resources instances can have their network
interfaces associated with any subnet in the same that contains their zones. A single VPN can be
used to give private connectivity from a physical data center to the VPC.

Subnets are defined by an internal IP address prefix range and are specified as CIDR notations.
CIDR stands for Classless Inter-Domain Routing. IP ranges cannot overlap between subnets.
They can be expanded but can never shrink. While IP ranges are specific to one region, they can
cross zones within the region. You can also create multiple subnets in a single region. Although
subnets do not need to conform to a hierarchical IP scheme. The internal IP ranges for a subnet
must conform to RFC 1918.

191
(Refer Slide Time: 03:04)

Virtual machines that are in different regions but in the same VPC can communicate privately.
VM1 and VM2 can communicate at a local level even though they are separated geographically.
Virtual machines that reside in different VPCs, even if the subnets are in the same region, need
to communicate via the Internet. In this instance, VM3 and VM4 will need public IP addresses to
traverse the Internet. Networks do not communicate with any other network by default.

(Refer Slide Time: 03:41)

192
GCP offers two types of VPC networks determined by their subnet creation mode. When an auto
mode network is created, one subnet from each region is automatically created within it. As new
GCP regions become available, new subnets and those regions are automatically added to the
auto mode networks. The automatically created subnets use a set of predefined IP ranges, and
default firewall rules are applied.

In addition to the automatically created subnets, you can add more subnets manually to auto
mode networks in regions you choose using IP ranges outside set of predefined IP ranges. When
expanding the IP range in an auto mode Network, the broadest prefix you can use is /16. Any
prefix broader than /16 would conflict with the primary IP ranges of other automatically created
subnets. Due to its limited flexibility, an auto mode network is better suited to isolated use cases
such as proof of concept, testing, and so on.

When a custom mode network is created, no subnets are automatically created. This type of
network provides you with complete control over its subnets and IP ranges. You decide which
subnet to create in regions you choose and using IP ranges you specify. You also define the
firewall rules, and you can expand the IP ranges to any RFC 1918 size. Custom mode networks
are, therefore, a lot more flexible and are better suited to production environments.

While you can switch a network from auto mode to custom mode, this conversion is one way.
Custom mode networks cannot be changed to auto mode networks.

193
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-54
Public and Private IP Address Basics

(Refer Slide Time: 00:05)

Next, you will learn the basics of public and private IP addresses in the cloud. A VPC is made up
of subnets. Each sub-networks or subnets must be configured with a private IP CIDR address.
The CIDR range will determine what private IP address will be used by virtual machines in the
subnet. Private IP addresses are only used for communication within the VPC and cannot be
routed to the Internet.

Each opted in an IP address is represented by 8 binary bits. So, a typical IPv4 address is 32 bits
long. The number at the end of the range determines how many bits will be static or frozen. This
number determines how many IP addresses are available with a CIDR address.

194
(Refer Slide Time: 00:50)

A /16 range will provide 65,536 available IP addresses. Every time you add one to the last
number the number of available IP addresses is cut in half. As shown here by the time 28 is
reached only 16 IP addresses are available.

(Refer Slide Time: 01:12)

Let’s look at some of the differences between public and private IP addresses. Internal IP
addresses are allocated to VMs by a dynamic host configuration protocol service or DHCP. The

195
lease for the IPs is renewed every 24 hours, and the name of the virtual machine is the host
name. The host name will be associated with the internal IP address through a network scoped
DNS service. External IP addresses can be ephemeral or reserved and are assigned from a pool
of IP addresses associated with the region.

If you allocate a reserved IP address but do not attach it to a virtual machine, you will be billed
for that IP address. Virtual machines are unaware of their public IP addresses. If you look at the
operating system network configuration, the virtual machine will only show the private IP
addresses.

196
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-55
Google’s Network Architecture

(Refer Slide Time: 00:05)

You will now review Google's Network architecture. What your private cloud is a
comprehensive set of networking capabilities and infrastructure that is managed by Google. With
a virtual private cloud, you can connect your GCP resources in a virtual private cloud and isolate
them from each other for purposes of security, compliance, and development versus test versus
production environments. Cloud load balancing provides high-performance, scalable load
balancing for GCP to ensure consistent performance for users.

197
(Refer Slide Time: 00:33)

A content delivery network serves content to users with high availability and high performance,
usually by storing files closer to the user. With Cloud CDN, Google's global network provides
low latency, low-cost content delivery.

(Refer Slide Time: 00:47)

198
Cloud interconnect lets you connect your own infrastructure to Google's Network edge with
enterprise-grade connections. Connections are offered by our partner network service providers
and may offer higher service levels than standard internet connections.

(Refer Slide Time: 01:03)

Cloud DNS or Domain Name System translates requests for domain names into IP addresses.
Google provides the infrastructure to publish specific domain names in high-volume DNS
service suitable for production applications.

(Refer Slide Time: 01:18)

199
This map represents the Google Cloud Platform at a high level. GCP consists of regions
represented by the markers in blue together with proposed future regions in white. A region is a
specific geographical location where you can run your resources. The number on each region
represents the zones within that region. Points of Presence or PoPs are represented by the grey
dots the PoPs are where the Google network is connected to the rest of the internet.

By operating an extensive global network of interconnection points, GCP can bring its traffic
close to its peers. Thereby reducing costs and providing users with a better experience. Google's
global private network is represented by the blue lines. The network connects regions and pops
and is composed of hundreds of thousands of miles of fiber optic cable and several submarine
cable investments.

The cables that have a year next to it are our latest investments. The last component that makes
up the architecture is Google's services themselves.

200
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-56
Routes and Firewall Rules in the Cloud

(Refer Slide Time: 00:07)

In this topic, you will consider routes and how firewall rules allow traffic to flow within a VPC.
By default, every network has routes that let instances in a network send traffic directly to each
other even across subnets. In addition, every network has a default route that directs packets to
destinations that are outside the network. Although these routes cover most normal routing
needs, you can also create special routes that override these routes.

Just creating a route does not ensure that packets will be received by the specified next hop.
Firewall rules must also allow the packet. The default network has pre-configured firewall rules
that allow all network instances to talk with each other. Manually created networks do not have
such rules. So, you must create them as you will experience in the first lab.

201
(Refer Slide Time: 00:56)

Routes match packets by destination IP addresses. However, no traffic will flow without also
matching a firewall rule. A route is created when a network is created, enabling traffic delivery
from anywhere. Also, a router is created when a subnet is created. This is what enables VMs on
the same network to communicate. This diagram shows a simplified routing table, but you will
look at this in more detail next. Each route in the routes collection can apply to one or more
instances around applies to an instance if the network and instance tags match.

If the network matches and there are no instance tags specified, the route applies to all instances
in that network. Compute Engine then uses the routes collection to create individual read-only
routing tables for each instance.

202
(Refer Slide Time: 01:51)

This diagram shows a massively scalable virtual router at the core of each network. Every virtual
machine instance in the network is directly connected to this router. All packets leave a virtual
machine instance our first handled at this layer before they are forwarded to their next hop. The
virtual network router selects the next hop for a packet by consulting the routing table for that
instance.

(Refer Slide Time: 02:19)

203
Every route consists of a destination and a next hop. Traffic whose destination IP is within the
destination range is sent to the next hop for delivery.

(Refer Slide Time: 02:30)

GCP firewall rules protect virtual machine instances from unapproved connections, both inbound
and outbound, known as ingress and egress, respectively. Essentially every VPC network
functions as a distributed firewall. Although firewall rules are applied to the network as a whole.
Connections are allowed or denied at the instance level. You can think of the firewall as existing
not only between your instances and other networks but between individual instances within the
same network.

GCP firewall rules are stateful. This means that if a connection is allowed between the source
and a target or a target and a destination, all subsequent traffic in either direction will be allowed.
In other words, firewall rules allow bi-directional communication once the session established.
Also, if, for some reason, all firewall rules in a network are deleted, there is still an implied deny
all ingress rule and an implied allow all egress rule for the network.

204
(Refer Slide Time: 03:38)

You should express your desired firewall configuration as a set of firewall rules. Conceptually a
firewall rule is composed of certain parameters. The direction of the rule, inbound connection are
matched against ingress rules only, and outbound connections are matched against egress rules
only. For the ingress, direction sources can be specified as part of the rule with IP addresses
source tags or a source service account.

For the egress direction, the destination can be specified as part of the rule with one or more
ranges of IP addresses. The protocol and port of the connection where any rule can be restricted
to apply to specific protocols only or specific combinations of protocols and ports only. The
rule's action is to allow or deny packets that match the direction protocol port and source or
destination of the rule. The priority of the rule which governs the order in which the rules are
evaluated. The first matching rule is applied.

And lastly, the rule assignment by default all rules are assigned to all instances, but you can
assign certain rules to certain instances only. Let’s look at some GCP firewall use cases for both
egress and ingress.

205
(Refer Slide Time: 05:01)

Egress firewall rules control outgoing connections that originated inside your GCP Network.
Egress allow rules allow outbound connections that match specific protocol, ports, and IP
addresses. Egress deny rules prevent instances from initiating connections that match non-
permitted port, protocol, and IP range combinations. For egress firewall rules, destinations to
which a rule applies may be specified using IP CIDR ranges.

Specifically, you can use a destination range to protect from undesired connections initiated by a
VM instance towards an external destination. For example, an external host. You can also use
destination ranges to protect from undesired connections initiated by a VM instance towards
specific GCP CIDR ranges. For example, a VM in a specific subnet.

206
(Refer Slide Time: 05:56)

Ingress firewall rules protect against incoming connections to the instance from any source.
Ingress allow rules allow specific protocol ports and IP addresses to connect in. The firewall
prevents instances from receiving connections on non-permitted ports or protocols. Rules can be
restricted to only affect particular sources. Source CIDR ranges can be used to protect an
instance from undesired connections coming either from external network or from GCP IP
ranges.

This diagram illustrates a VM receiving a connection from an external address and another VM
receiving connection from a VM in the same network. You can control ingress connections from
VM instance by constructing inbound connection conditions using a source CIDR ranges,
protocols, or ports.

207
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-57
Multiple VPC Networks

In this next topic, you will find out how to utilize multiple VPC is used to build robust
networking solutions. Shared VPC allows an organization to connect resources from multiple
projects to a common VPC network. This allows the resources to communicate with each other
securely and efficiently using internal IP from that network.

(Refer Slide Time: 00:22)

In this diagram, there is one network that belongs to the web application service project. This
network is shared with three other projects, namely the Recommendation Service,
Personalization Service, and Analytics Service. Each of these service projects has instances that
are in the same network as the web application server allowing for private communication to that
server using internal IP addresses. The web application server communicates with clients and on-
premises using the server’s external IP address.

208
The back-end services, on the other hand, cannot be reached externally because they only
communicate using internal IP addresses. When you use shared VPC, you designate a project as
a host project and attach one or more other service projects. In this case, the Web Application
Service project is the host project, and the three other projects are service projects. The overall
VPC network is called the shared VPC network. VPC network pairing allows private RFC1918
connectivity across two VPC networks regardless of whether they belong to the same project or
the same organization.

Now, remember that each VPC network will have firewall rules that define what traffic is
allowed or denied between the networks.

(Refer Slide Time: 01:54)

In this diagram, there are two organizations that represent a consumer and producer, respectively.
Each organization has its own organization node, VPC network, VM instances, and network
admin, and instance admin. In order for VPC Network peering to be established successfully, the
producer network admin needs to peer the producer network with the consumer network. And
the consumer network admin needs to peer, the consumer network with the producer network.

209
When both peering connections are created, the VPC network becomes active during the session,
and routes are exchanged. This allows to VM instance has to communicate privately using their
internal IP addresses. VPC Network peering is a decentralized or distributed approach to multi-
project networking because each VPC network may remain under the control of separate
administrator groups and maintain its own Global firewall and routing tables.

Historically, such projects would consider external IP addresses or VPNs to facilitate private
communication between VPC networks. However, VPC network peering does not incur the
network latency, security, and cost drawbacks that are present when using external IP addresses
or VPNs.

(Refer Slide Time: 03:22)

There are some things to remember when using the VPC network peering. VPC network peering
words with Compute Engine, Google Kubernetes Engine, and App Engine flexible
environments. Peered VPC networks remain administratively separate, which means that route
firewalls VPNs and other traffic management tools are administered and applied separately in
each of the VPC networks. Each side of the peering association is set up independently.

210
So, peering will be active only when the configuration from both sides match. This allows either
side to delete the peering association at any time. A subnet CIDR prefix in one peered VPC
network cannot overlap with a subnet CIDR prefix and another peered network. This means that
to auto mode, VPC networks that only have the default subnet cannot pair.

(Refer Slide Time: 04:17)

There is one more thing to remember when using the VPC network peering. Only directly peered
networks can communicate, meaning that the transitive peering is not supported. In other words,
if VPC network N1 has peered with N2 and N3 but N2 and N3 are not directly connected. VPC
Network N2 cannot communicate with the VPC Network N3 over the peering. This is critical if
N1 is a Software-as-a-Service organization offering services to N2 and N3.

211
(Refer Slide Time: 04:50)

Now that you have learned about shared VPC and VPC network peering. Let’s compare both of
these configurations to help you decide which is appropriate for a given situation. If you want to
configure private communication between VPC networks in different organizations, you have to
use VPC network peering. Shared VPC only works within the same organization. Somewhat
similarity if you want to configure private communication between VPC networks in the same
project. You have to use the VPC network peering.

This does not mean that the network needs to be in the same project, but they can be as you will
explore in the upcoming lab. Shared VPC only works across projects. In a shared VPC, the
network administration is centralized. In a VPC Network peering situation, the network
administration is decentralized.

212
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-58
Building Hybrid Clouds

(Refer Slide Time: 00:05)

In the next topic, you will learn how to build a hybrid Cloud using GCP. Cloud VPN securely
connects the on-premise network to GCP, VPC Network through an IPSEC, VPN Tunnel.
Traffic traveling between the two networks is encrypted by one VPN gateway then decrypted by
the other VPN gateway. This protects data as it travels over the public internet, and that is why
Cloud VPN is useful for low-volume data connections.

As a managed service, Cloud VPN provides an SLA of 99.9% service availability and support
site-to-site VPN. Cloud VPN only supports site-to-site IPSEC VPN connectivity. It doesn’t
support the client to gateway scenarios. In other words, Cloud VPN doesn’t support use cases
where client computer need to dial into a VPN using client VPN software. Cloud VPN supports
both static routes and dynamic routes to manage traffic between VM instances an existing
infrastructure.

213
Dynamic routes are configured with the Cloud Router, which you cover briefly. Both IKE
version 1 and version 2 ciphers are also supported. Cloud interconnect provides two options for
extending an on-premise network to a Google Cloud Platform VPC Network.

(Refer Slide Time: 01:27)

Cloud interconnects are dedicated, referred to as dedicated interconnect, and cloud interconnect
partner, also referred to as partner interconnect. Choosing interconnect types will depend on
connection requirements such as the connection location and capacity.
(Refer Slide Time: 01:46)

214
Dedicated interconnect provides direct physical connectivity between an organization's on-
premise network and the Google Cloud Network Edge, allowing them to transfer a large amount
of data between networks, which can be more cost-effective than purchasing additional
bandwidth over the public internet. If ten gigabytes per second or hundred gigabytes per second
connections aren’t required. Partner interconnect provides a variety of capacity options. Also, if
an organization cannot physically meet Google's Network requirements in a colocation facility,
they can use partner interconnect to connect to a variety of service providers to reach their VPC
networks.

Partner interconnect provides a service provider enables connectivity between an on-premise


network and Google Cloud Network Edge, allowing an organization to extend its private
network into its Cloud Network. The service provider can provide solutions that minimize router
requirements on the organization premises only to supporting an Ethernet interface to the cloud.
Let’s compare the interconnect options to consider all of these options to provide internal IP
address access between resources in an on-premise network and VPC Network.

(Refer Slide Time: 03:10)

The main differences are the connection capacity and the requirements for using a service. The
IPsec VPN tunnels that cloud VPN offers have a capacity of one and half to three gigabytes per

215
second for tunnel and require a VPN device on the on-premise network. The one and half
gigabyte per second capacity applies to the traffic that traverses the public internet. The three
gigabytes per second capacity applies to the traffic that is traversing a direct peering link.

Configuring multiple tunnels allows you to scale this capacity. Dedicated interconnect has a
capacity of ten gigabytes per second per link and requires you to have a connection in the Google
support at a colocation facility. You can have up to eight links to achieve multiples of ten
gigabytes per second, but ten gigabytes per second is the minimum capacity. Partner interconnect
has a capacity of fifty megabytes per second to ten gigabytes per second per connection.

And requirements depend on the service provider. The recommendation is to start with VPN
tunnels, and depending on the proximity to a colocation facility and capacity requirements to
switch to a dedicated interconnect, or partner interconnect when there is a need for enterprise-
grade connections to GCP.

(Refer Slide Time: 04:31)

Google allows an organization to establish a direct peering connection between their business
networks and ours. With this connection, they will be able to exchange internet traffic between
their network and ours at one of the Google's broad-reaching edge network locations. Direct

216
peering with Google is done by exchanging Border Gateway Protocol routes between Google
and the peering entity. And after a direct peering connection is in place, they can use it to reach
all of our services, including the full suite of GCP products.

Unlike dedicated interconnect, direct peering does not have an SLA. In order to use direct
peering, they need to satisfy the peering requirements. If an organization requires access to
Google public infrastructure and cannot satisfy or peering requirements, they can connect
through a carrier peering service provider.

(Refer Slide Time: 05:25)

Carrier peering enables them to access Google applications such as G Suite by using a service
provider to obtain enterprise-grade Network Services that connect their infrastructure to Google.
When connecting to Google through a service provider, they can get connections with higher
availability and lower latency using one or more links. As direct peering, Google doesn’t offer an
SLA with carrier peering, but the network service provider might.

217
(Refer Slide Time: 05:56)

Let’s compare the peering options that you just considered. Both of these options provide public
IP address access to all of our services. The main differences are capacity and the requirements
for using a service. Direct peering has a capacity of ten gigabytes per second per link and
requires you to have a connection in a GCP edge point of presence. Carrier peering capacity and
requirements vary depending on the service provider that you work with.

218
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-59
Different Options for Load Balancing

(Refer Slide Time: 00:07)

Now, you will consider the different load balancing options available. You can use load
balancing to take more advantage of an augmented infrastructure. You already configured
networking between different virtual machines, but how can you route traffic between multiple
virtual machines. Load balancing is the first thing that comes into play. HTTPS, SSL Proxy, and
TCP Proxy load balancing are global services. Whereas network and internal load balancing are
regional.

219
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-60
Quiz

(Refer Slide Time: 00:04)

Now, it’s time to test your knowledge with a short quiz.


What is the key distinguishing feature of networking in GCP?
Network topology isn’t dependent on the IP address layout.

220
(Refer Slide Time: 00:17)

Which one of the following is true?


VPCs are global, and subnets are regional.

(Refer Slide Time: 00:26)

Select the global load balancer from the list.


The global load balancer is a TCP proxy.

221
Google Cloud Computing Foundation Course
Priyanka Vergardia
Google Cloud

Lecture-61
Summary

(Refer Slide Time: 00:05)

That concludes the Google Cloud network module. Here is a reminder of what you have learned.
Computers connect to your networks. Google's GCP delivers millions of customer's software and
services around the globe through its online cloud network. IP addresses allow networks to
connect internally. They can be either public or private. There are five primary Google
networking products- Virtual Private Cloud, Cloud Load Balancer, Cloud CDN, Cloud
Interconnect, and Cloud DNS. VPCs are software-defined network constructs. Google's VPC is
global.

A route is a mapping of an IP range to a destination that also considers firewall rules. Firewalls
protect networks from unapproved connections.

222
(Refer Slide Time: 00:51)

Shared VPC allows an organization to connect resources from multiple projects to a common
VPC network. Direct peering provides a direct connection between a business network and
Google. Carrier peering provides connectivity through a supported partner. Finally, load
balancing can be used to take advantage of an augmented infrastructure.

223
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-62
Module introduction

Hey there, it is Seth again and welcomes to the module. Let Google keep an eye on things. In this
module, you’ll learn about automation and monitoring in the Cloud. So far, in this Google Cloud
Computing Foundations Course, you’ve learned what the Cloud is, the Google Cloud Platform,
and how to use Google Cloud Platform to build applications. You then explored storage options,
the role of APIs, cloud security, and networking.

In this module, you’ll look at the role Google Cloud can play in automating the creation and
management of your GCP resources as well as ensuring your infrastructure and applications are
running optimally.

(Refer Slide Time: 00:40)

The objective of this module is to identify cloud automation and management tools, the specific
learning objectives to achieve this include introducing Infrastructure as Code. Discuss Google
cloud deployment manager as an Infrastructure as Code option. Explain the role of monitoring,

224
logging, tracing, debugging, and error reporting in the Cloud and to describe using Stackdriver
for monitoring, logging, tracing, debugging, and error reporting.

(Refer Slide Time: 01:09)

This agenda shows the topics that make up the module. The module starts with an introduction to
Infrastructure as Code. Before moving on to Cloud Deployment Manager as an Infrastructure as
Code tool. You’ll then learn the importance of monitoring and managing existing services,
applications, and infrastructure before you look at the integrated tools that make up Google
Stackdriver. To end the module, you’ll complete a hands-on lab where you’ll monitor the metrics
provided by project virtual machines in Stackdriver before you complete a short quiz and review
the main learning points of the module.

225
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-63
Introduction to IaC

(Refer Slide Time: 00:06)

Let’s start with the first topic, where we discussed the concept of Infrastructure as Code. As the
name implies, Infrastructure as Code or IaC takes what a required infrastructure needs to look
like and defines that as code. You capture the code in a template file that is both human-readable
and machine consumable. Infrastructure as Code tools allow you to provision entire
infrastructure stacks from templates rather than use a web console or run commands manually to
build all the parts of the system.

The template is used to build the infrastructure automatically. That same template enables
resources to be automatically updated or deleted as required. Because templates are treated as
code, they can be stored in repositories, tracked using version control systems, and shared with
other users and teammates. Templates can also be used for disaster recovery. If the infrastructure
needs to be rebuilt for any reason, those templates can be used to recover automatically.

226
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-64
Cloud Deployment Manager

In this topic, you’ll learn about Google Cloud Deployment Manager, GCP’s Infrastructure as
Code tool. Cloud Deployment Manager is an Infrastructure as Code tool to manage GCP
resources. Setting up an environment and GCP can entail many tasks, including setting up,
compute, network, and storage resources, and then keeping track of their configurations. You can
do this all by hand if you want, but it is far more efficient to use a template, which is a
specification of what the environment should look like. Cloud Deployment Manager allows you
to do this.

(Refer Slide Time: 00:35)

You create template files that describe what you want the components of your environment to
look like. This allows the process of creating these resources to be repeated over and over with
very consistent results. You can focus on the set of resources that comprise the application or
service instead of separately deploying each service resource. One resource definition can also
reference another resource, creating dependencies, and controlling the order of execution.
Adding, deleting, or changing resources in the deployment is also a lot easier.

227
(Refer Slide Time: 01:11)

Many tools use an imperative approach, which requires you to define the steps to take to create
and configure resources. A declarative approach allows you to specify what the configuration
should be like and let the system figure out the exact steps to take together. Cloud Deployment
Manager allows you to specify all the resources needed for your application in a declarative
format using YAML. YAML is a human-readable data serialization language commonly used for
configuration files.

228
(Refer Slide Time: 01:45)

Templates allow the use of building blocks to create abstractions or sets of resources typically
deployed together, such as an instance template, an instance group, or even an auto-scalar. You
can use Python or Jinja2 templates to parameterize the configuration. Allowing them to be used
repeatedly by changing input values to define what image to deploy, the zone to deploy, or how
many virtual machines to deploy. You can also pass in variables like zone, machine size, number
of machines, and whether it is a test or production or staging environment.

You can pass these into those templates to get output values back like IP addresses assigned or
links to the instance.

229
(Refer Slide Time: 02:32)

In addition to Google Deployment Manager, which is specific to Google Cloud and cannot be
used outside with other Cloud providers. Google has a team of engineers dedicated to ensuring
that Google Cloud support is also available for popular third party open-source tools that support
Infrastructure as Code.

230
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-65
Monitoring and Managing Your Services, Apps, and Infrastructure

In this topic, you'll be introduced to the important activities that need to be performed to monitor
and manage existing services, applications, and infrastructure. There are a number of activities
that are essential in managing existing services, applications, and infrastructure.

(Refer Slide Time: 00:18)

You need visibility of the performance uptime and overall health of web applications and other
internet-accessible services running on your cloud environment. This includes gathering metrics
events and metadata from applicable applications, platforms, and components.

231
(Refer Slide Time: 00:38)

You need to search, filter, and view logs from your Cloud and open-source applications.
Application errors should be reported and aggregated as alerts.

(Refer Slide Time: 00:49)

Latency management is an important part of managing the overall application performance. It's
important to be able to answer questions like, how long does it take my application to handle a

232
request, or why do some of my requests take longer than others, or what's the overall latency of
all requests to my application.

(Refer Slide Time: 01:11)

In the event that a bug does exist and we know that never happens. You need to inspect the state
of your running applications in real-time to investigate your code's behavior and determine the
cause of the problem.

233
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-66
Stackdriver

In the previous topic, you saw that certain key activities need to be performed to monitor and
manage existing services, applications, and infrastructure. In this topic, you will learn how
Stackdriver can be used for monitoring, logging, error reporting, tracing, and debugging your
applications in the Cloud.

(Refer Slide Time: 00:21)

Stackdriver provides powerful monitoring, logging, and diagnostics for applications on Google
Cloud Platform. It equips you with insight into the health, performance, and availability of
Cloud-powered applications enabling you to find and fix issues faster. Stackdriver gives you
access to many different kinds of signals for your infrastructure platforms, virtual machines,
containers, middleware, and all your application tiers, including logs, metrics, and traces. It gives
you insight into your application's health, performance, and availability. So, if issues occur, you
can fix them faster.

234
(Refer Slide Time: 01:00)

Let's start by looking at Stackdriver monitoring. A full-stack monitoring service that discovers
and monitors Cloud resources automatically. Flexible dashboards and rich visualization tools
help you identify emerging issues. Anomaly reporting, pattern detection, and exhaustion
prediction provide insights into longer-term trends that may require attention. Monitoring
provides a single integrated service for metrics, dashboards, uptime monitoring, and alerting.
This means that you spend less time maintaining disparate systems. Advanced alerting
capabilities, including the rate of change, cluster aggregation, and multi-condition policies, help
ensure you are notified when critical issues occur while reducing the likelihood of a false
positive.

Integrated uptime monitoring and health checks ensure quick notification of failures. It's possible
to drill down from the alerts dashboard to logs and traces to identify the root cause of a problem
quickly.

235
(Refer Slide Time: 02:06)

Stackdriver logging is a real-time log management and analysis service. Stackdriver Logging is a
fully integrated solution that works seamlessly with Stackdriver monitoring, Stackdriver trace,
Stackdriver error reporting, and Stackdriver debugger. The integration allows users to navigate
between incidents, charts, traces, errors, and logs. This helps users quickly find the root cause of
issues in their applications and systems.

Logging is built to scale and works well. It's sub-second ingestion latency even at terabytes per
second. Logging is a fully managed solution that takes away the overhead of deploying or
managing a cluster, thus focusing your energy on innovation and building your product. Logging
provides a central place for all your logs. You can also configure Stackdriver to export logs to
other systems automatically.

Stackdriver logging allows you to analyze high volume application and system-level logs in real-
time. Advanced log analysis can be achieved by combining the power of the Stackdrivers suite
with the data and analytics products of Google Cloud Platform. For example, you can create
powerful real-time metrics from the log data and analyze that log data in real-time using a tool
like BigQuery.

236
(Refer Slide Time: 03:31)

Stackdriver error reporting allows you to identify and understand application errors through real-
time exception monitoring and alerting. Error reporting also allows you to see your application's
top errors in a single dashboard. Real production problems can often be hidden across mountains
of data. Stackdriver error reporting helps you see problems through the noise by constantly
analyzing exceptions and intelligently aggregating them into meaningful groups that are tailored
to your programming language or framework.

Stackdriver error reporting is constantly watching your service and instantly alerts you when a
new application error cannot be grouped together with the existing ones. You can directly jump
from a notification to the details of a new error. The exception stack trace parser is able to
process Go, Java, Dot Net, NodeJS, PHP, Python, and Ruby. You can also use Google client
libraries and REST APIs to send errors with Stackdriver logging.

237
(Refer Slide Time: 04:36)

Stackdriver Trace is a distributed tracing system that collects latency data from applications and
displays it in the Google Cloud console. Using Stackdriver trace, you can expect detailed latency
information for just a single request or aggregate latency across your entire application. You can
quickly find where bottlenecks are occurring and more quickly identify their root causes.
Stackdriver trace continuously gathers and analyzes data from applications to automatically
identify changes to an application's performance.

These latency distributions are available through the analysis reports feature. They can be
compared over time or versions, and Stackdriver trace will automatically generate an alert if it
detects a significant shift in an application's latency profile. The language-specific SDKs of
Stackdriver trace can analyze projects running on virtual machines. The Stackdriver trace SDK is
currently available for Java, NodeJS, Ruby, and Go.

And the Stackdriver trace API can be used to submit and retrieve trace data from any source or
any language. A Zipkin collector is also available, which allows Zipkin tracers to submit data to
Stackdriver trace. Stackdriver trace works out of the box on many GCP services, including App
Engine.

238
(Refer Slide Time: 06:00)

Stack driver debugger is a feature of Google Cloud that lets you inspect the state of a running
application in real-time without stopping it or slowing it down. Stackdriver debugger can be used
with production applications with just a few mouse clicks you can take a snapshot of your
running application state or inject a new logging statement. That snapshot captures the call stack
and variables at a specific code location the first time any instance executes that code.

The injected log point behaves as part of the deployed code writing the log messages to the same
log stream. Stackdriver debugger is easier to use when the source code is available to Google
Cloud. It knows how to display the correct version of the source code when a version control
system such as Google Cloud source repositories, GitHub, Bitbucket, or GitLab is available.
Users can easily collaborate with other teammates by sharing their debug sessions.

Sharing a debug session in Stackdriver debugger is as easy as sending the console URL.
Stackdriver debugger is integrated into the existing developer workflows you are familiar with.
You can launch Stackdriver debugger and take snapshots directly from Stackdriver logging, error
reporting, dashboards, integrated development environments, and even the G-cloud command-
line interface.

239
(Refer Slide Time: 07:22)

Interface poorly performing code increases the latency and cost of applications and web services
every day. Stackdriver profiler continuously analyzes the performance of CPU or memory-
intensive functions that are executed across your applications. While it's possible to measure
code performance and development environments, the results generally do not map well, so what
is happening in production. Unfortunately, many production profiling techniques either slow
down code execution or only inspect a small subset of the codebase.

Stackdriver profiler uses a statistical technique extremely low impact instrumentation that runs
across all production application instances to provide a complete picture of an application's
performance without slowing it down. Stack driver profiler allows developers to analyze
applications running anywhere, including GCP and other cloud platforms on-premise or with the
support for Java, Go, NodeJS, and Python.

240
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-67
Quiz

(Refer Slide Time: 00:04)

Now it is time to test your knowledge with a short quiz. Which of the following best describes
Infrastructure as Code? Infrastructure as Code tools are used to automate the construction of an
entire infrastructure.

241
(Refer Slide Time: 00:14)

Which of the following is true about Cloud Deployment Manager? Cloud Deployment Manager
is a declarative tool. You’re creating a configuration file in YAML format that is the
configuration of the infrastructure.

(Refer Slide Time: 00:27)

242
What Stackdriver service allows you to inspect detailed latency information for a single request
or view aggregate latency for your entire application? Stackdriver Trace is used to sample the
latency of an application.

243
Google Cloud Computing Foundation Course
Seth Vargo
Google Cloud

Lecture-68
Summary

(Refer Slide Time: 00:07)

This concludes the let Google keep an eye on things module. Let me remind you of what you
have learned. Infrastructure as Code is taking what a required infrastructure needs to look like
and defining it as code. Infrastructure as Code tools allow for the creation of entire architectures
through templates that serve as configuration files. Google Cloud Deployment Manager is an
Infrastructure as Code tool to manage Google Cloud resources.

Cloud Deployment Manager uses a declarative approach that allows you to specify the
configuration and let the system figure out the steps to take together. Stackdriver Monitoring
provides visibility into the performance uptime and overall health of web applications and other
internet-accessible services running in your Cloud environment. Stackdriver logging allows you
to store, search, analyze, monitor, and alert on log data and events in a single place.

244
(Refer Slide Time: 01:03)

Stackdriver Error Reporting counts, analyzes, and aggregates the crashes in your running cloud
services in real-time. And Stackdriver Trace allows you to inspect detailed latency information
for a single request or view aggregate latency information for your entire application. Stackdriver
Debugger lets you inspect the state of a running application in real-time without stopping it or
slowing it down. And finally, Stackdriver Profiler continuously gathers and analyzes CPU
performance and memory-intensive functions executed across your production applications.

245
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-69
Module Introduction

Hi, welcome to you have data, but what are you going to do with it? I am Evan, a technical
curriculum developer here at Google. In this module, you learn how you can gain insight through
data using manage Big Data services. So far, this Google Cloud Computing Foundation Course,
you have discussed what Cloud Computing is, the Google Cloud Platform, and using GCP to
build apps. You then explored storage options, the role of APIs, cloud security, networking, and
the role GCP can play and automating the creation and management of your GCP resources. In
this module, you look at some of the managed services that Google offers to process your Big
Data.

(Refer Slide Time: 00:39)

The objective of this module is for you to discover a variety of manage Big Data services in the
Cloud. The specific learning objectives to achieve this include you being able to discuss Big
Data managed services available in the Cloud. Describe the use of Cloud Dataproc to run Spark,
Hive, Pig, and MapReduce as a managed service. Explain the building of extract transform and

246
load pipelines using Cloud Dataflow. And discuss BigQuery as a managed data warehouse and
analytics engine.

(Refer Slide Time: 01:09)

This agenda shows the topics that make up the module. The module starts with an introduction
into Big Data managed services in the Cloud. Before moving on to how Big Data operations can
be leveraged through Cloud Dataproc. You’ll then complete two labs, where use the GCP
Console and then the G-Cloud command-line tool to create a Cloud Dataproc cluster and
perform various tasks. After the labs, you will explore the use of Cloud Dataflow to perform
extract, transform, and load operations.

The next two labs provide an opportunity to learn more about Cloud Dataflow. In the first, you’ll
create a streaming pipeline using a Cloud Dataflow template, and in the second, you will set up a
Python development environment get the Cloud Dataflow SDK for Python and run an example
pipeline using the GCP console. In the final topic, you will learn about the role of BigQuery as a
data warehouse. You will complete the module with a short quiz and a review of the main
learning points of the module.

247
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-70
Introduction to Big Data Managed Services in the Cloud

Let's start with the first topic, where you'll be introduced to Big Data Manage Services in the
Cloud. Before we discuss Big Data Managed Services in the Cloud, let's take a moment to
conceptualize Big Data. Enterprise storage systems are leaving the terabyte behind as a measure
of data size with petabytes becoming the norm. We know that one petabyte is 1 million gigabytes
or 1000 terabytes, but how big is that.

(Refer Slide Time: 00:27)

From one perspective, a petabyte of data might seem like more than you will ever need. For
example, you will need a stack of floppy disks higher than 12 Empire State Building's to store
one petabyte.

248
(Refer Slide Time: 00:39)

If you wanted to download one petabyte over a 4G network, you'd need to wait around for 27
years.

(Refer Slide Time: 00:48)

249
You also need one petabyte of storage for every tweet ever tweeted multiplied by 50. OK, so one
petabyte is pretty big.

(Refer Slide Time: 00:59)

Looking at it from one other perspective, though. one petabyte is only enough to store two
micrograms of DNA or one day's worth of video uploaded to YouTube. So, for some industries,
a petabyte of data might not be that much at all. Every company saves data in some way. 90% of
data saved by companies is unstructured. With all these data available, companies are now trying
to gain some insight into their business based on the data that they have.

This is where Big Data comes in. Big Data architectures allow companies to analyze their save
data to learn more about their business. In this module, you will be focusing on three managed
services that Google offers to process that data for companies that have already invested in
Apache Hadoop and Apache Spark and would like to continue using them.

250
(Refer Slide Time: 01:45)

Cloud Dataproc provides a great way to run open-source software in Google Cloud. However,
companies looking for a streaming data solution may be more interested in Cloud Dataflow as a
managed service. Cloud Dataflow is optimized for large scale batch processing or long-running
stream processing of both structured and unstructured data.

The third managed service we will look at is BigQuery, which provides a data analytic, solution-
optimized solution for getting questions answered rapidly over petabyte-scale data sets.
BigQuery allows for fast SQL or structured query language on top of your structured data.

251
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-71
Leverage Big Data Operations with Cloud Dataproc

In this topic, learn about how Cloud Dataproc provides a fast, easy, cost-effective way to run
Apache Hadoop in Apache Spark, which are open source Big Data technologies that support big
data operations.

(Refer Slide Time: 00:14)

Hadoop and Spark are open source technologies that often form the backbone of Big Data
processing. Hadoop is a set of tools and technologies that enable a cluster of computers to store
large volumes of data. It intelligently ties together individual computers in a cluster to distribute
the storage in the processing of that data. Apache Spark is a unified analytics engine for large-
scale data processing and achieves both high performances for both batch and streaming data.

Cloud Dataproc is a managed Spark and Hadoop service that lets you take advantage of the
open-source data tools for batch processing, querying, streaming, and machine learning. Cloud
Dataproc automation helps you quickly create those clusters manage them easily and because

252
clusters are typically run ephemerally, meaning short-lived. You will save money as they are
turned off when you do not need that processing power anymore. Let’s take a look at the key
features of Cloud Dataproc.

(Refer Slide Time: 01:17)

Cloud Dataproc is priced at 1 cent per virtual CPU per cluster per hour on top of any other GCP
resources that you use. In addition, Cloud Dataproc clusters can include preemptable instances
that have lower compute prices. You use and pay for things only when you need them and not
when you do not. Cloud Dataproc clusters are quick to start to scale and to shut down, which
each of these operations is taking 90 seconds or less on average.

Clusters can be created and scaled quickly with a variety of virtual machine sizes, types, number
of nodes, and networking options. You can use Spark and Hadoop tools libraries and
documentation with Cloud Dataproc. Cloud Dataproc provides frequent updates to native
versions of Spark, Hadoop, Pig, and Hive. So, there is no need to learn tools or APIs. It is
possible to move your existing projects or your ETL pipelines. It is a Google Cloud without
redevelopment.

253
You can easily interact with clusters and Spark or Hadoop jobs without the assistance of an
administrator or special software for the GCP console, the Cloud SDK, or the Cloud Dataproc
REST API. When you are done with the cluster, turn it off. So money is not spent on an idle
cluster. Image versioning allows you to switch between different versions of Apache Spark,
Apache Hadoop, and other tools.

The built-in integration with Cloud Storage, BigQuery, and Cloud BigTable ensures data will
never be lost even when your cluster turns down. With Stackdriver logging and Stackdriver
monitoring, this provides a complete data platform and not just a Spark or a Hadoop cluster. For
example, you can use Cloud Dataproc to effortlessly ETL terabytes of raw log data directly into
BigQuery for your business reporting needs.

So, how does Cloud Dataproc work? Spin up a cluster when needed, for example, to answer a
specific query or run a specific ETL job.

(Refer Slide Time: 03:25)

The architecture depicted here provides insight into how the cluster remains separate yet easily
integrates with other important functionalities, for example, logging via Stackdriver and Cloud
BigTable instead of Hbase. This contributes to the ability of Cloud Dataproc to run ephemerally

254
and, therefore, efficiently and cost-effectively. The Cloud Dataproc approach allows users to use
Hadoop, Spark, Hive, and Pig when they need it.

Again as we mentioned, it only takes 90 seconds on average from the moment users request the
resources before submitting their first job. What makes this possible? Is this separation of storage
and compute? Which is a real game-changer?

(Refer Slide Time: 04:08)

With their traditional approach, typical on-premise clusters their storage, and their hard drives
are attached to each of the nodes in the cluster. If the cluster is not available to maintenance,
neither is the storage. Since the storage is attached to the same computing nodes as those that do
the processing, there is often contention for resources, for example, input and output bottlenecks
on the cluster. Cloud Dataproc, on the other hand, relies on storage resources being separated
from those computing resources.

Files are stored on Google Cloud Storage or the Google Cloud Storage Connector, meaning that
using Google Cloud Storage instead of HDFS is as easy as changing the prefix in the scripts
from HDFS to GS or Google Storage.

255
(Refer Slide Time: 04:53)

Also, consider Cloud Dataproc in terms of Hadoop and Spark jobs and workflows. The workflow
template allows users to configure and execute one or more jobs. It’s important to remember that
beyond making the process easier, for example, allowing the user to focus on jobs and view the
logs and Stackdriver. They can always access the Hadoop components and applications such as
the Yarn Web UI running on their Cloud Dataproc cluster if they wanted to.

(Refer Slide Time: 05:23)

256
To run a cluster when needed for a given job to answer a specific query, this architecture shows
what is possible and how it can integrate with managed services running outside the cluster, for
example, logging and monitoring through Stackdriver or Cloud BigTable instead of traditional
HBase.

(Refer Slide Time: 05:40)

Let’s look at a few of those use cases. Starting with how Cloud Dataproc can help with log
processing. In this example, a customer processes 50 Gigabytes of text log data per day from
several sources to produce aggregated data that is then loaded into databases from which metrics
are then gathered for things like daily reporting, management dashboards, and analysis. |Up until
now. They have used a dedicated on-premises cluster to store and process the logs with
MapReduce.

So what is the solution? Firstly, Cloud Storage can act as a landing zone for the log data at a low
cost. A Cloud Dataproc cluster can then be created in less than two minutes to process it with
existing MapReduce. Once completed, the Cloud Dataproc cluster can be removed immediately;
it is not needed anymore. In terms of value instead of running all the time and incurring costs
when it is not used.

257
Cloud Dataproc only runs to process those logs, which saves money and reduces your overall
complexity.

(Refer Slide Time: 06:47)

The second use case looks at how Cloud Dataproc can help with ad-hoc analysis. In this
organization and let’s rely on and are comfortable with using Spark Shell. However, their IT
department is concerned about the increase in usage and how to scale their cluster running in
standalone mode. As a solution, Cloud Dataproc can create clusters that scale for speed and
mitigate any single point of failure.

Since the Cloud Dataproc supports Spark, Spark SQL, and Pi Spark, they could use the web
interface Cloud SDK or the native Spark Shell via SSH. In terms of value, Cloud Dataproc
quickly unlocks the power of the Cloud for anyone without adding technical complexity.
Running complex computations now takes seconds instead of minutes or hours on-premise.

258
(Refer Slide Time: 07:41)

The use case in this third example looks at how Cloud Dataproc and help with machine learning.
In this example, a customer uses Spark machine learning libraries to run classification algorithms
on very large data sets. They rely on Cloud-based machines to install and customize Spark
because Spark and the machine learning libraries can be installed on any Cloud Dataproc cluster.
The customer can save time by quickly creating Cloud Dataproc clusters.

Any additional customization can be applied easily to the entire cluster through what is called
initialization actions. To keep an eye on workflows, they can be used with the built-in Cloud
logging and monitoring solutions. In terms of value, resources can be focused on the data with
Cloud Dataproc not spent on things like cluster creation and management. Also, integrations
with other new GCP products can unlock new features for your Spark clusters.

259
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-72
Build ETL Pipelines using Cloud Dataflow

(Refer Slide Time: 00:10)

In this topic, you learn how to use Cloud Dataflow to perform extract, transform, and load
operations. Cloud Dataflow offers simplified streaming and batch data processing. It is a data
processing service based on Apache Beam that lets you develop and execute a range of data
processing patterns, extract, transform and load batch and streaming. You use Cloud Dataflow to
build data pipelines, monitor their execution, and transform and analyze it.

Importantly the same pipelines, the same code that you are going to write, work for batch data
and streaming data. You will explore pipelines more in detail shortly. Cloud Dataflow fully
automates operational tasks like resource management and performance optimization for your
pipeline. All resources are provided on-demand and automatically scale to meet requirements.
Cloud Dataflow provides built-in support for fault-tolerant execution that is consistent and
correct regardless of data size, cluster size, processing pattern, or even the complexity of your
pipeline.

260
Through its integration with the GCP console, Cloud Dataflow provides statistics such as
pipeline throughput and lag and the consolidated worker log and inspection all in near real-time.
It integrates with Cloud Storage, Cloud Pub/Sub, Cloud Datastore, Cloud BigTable, and
BigQuery for seamless data processing. It is the glue that can hold it all together. It can also be
extended to interact with other sources and syncs like Apache Kafka and HDFS.

(Refer Slide Time: 01:42)

Google provides QuickStart templates for Cloud Dataflow to rapidly deploy a number of useful
data pipelines without requiring any Apache Beam programming experience. The templates also
remove the need to develop the pipeline code and, therefore, consider the management of
component dependencies in that pipeline code. You will do a lab later. We will create a
streaming pipeline using one of these Google Cloud Dataflow templates. Let’s look at pipelines
now in more detail.

261
(Refer Slide Time: 02:15)

A pipeline represents a complete process on one or more data sets. The data can be brought in
from external data sources. It could then have a series of transformation operations such as
filters, joins, aggregations, etc. applied to that data to give it some meaning and to achieve its
desired form. This data could then be written to a sink. The sink could be within GCP or
external. The sink could be even the same as the data source.

The pipeline itself is what is called a Directed Acyclic Graph or a DAG. Peak collections are
specialized containers of nearly unlimited size representing a set of data in the pipeline. These
datasets can be bounded, also referred to as fixed size such as the national census data or
unbounded such as a Twitter feed or data from weather sensors coming in continuously.
PCollections are the input and the output of every single transform operation.

Transforms are the data processing steps inside of your pipeline, transforms take one or more of
those PCollections perform an operation that you specify on each element in that collection and
produce one or more PCollections as an output. A transform can perform nearly any kind of
processing operation, including performing mathematical computations on data, converting data
from one format to another, grouping data together, reading and writing data, filtering data to
only the elements that you want, or combining data elements into single data values.

262
Source and sink APIs provide functions to read data into and out of collections. The sources act
as the roots of the pipeline, and the sinks are the endpoints of the pipeline. Cloud Dataflow has a
set of built-in sinks and sources, but it is also possible to write sources and sinks for custom data
sources too. Let’s look at different pipeline examples to get a sense of the processing capabilities
of Cloud Dataflow.

(Refer Slide Time: 04:28)

In this multiple transform pipeline example, data read from BigQuery is filtered into two
collections based on the initial character of the name. Note that the inputs in these examples
could be from a different data source and that this pipeline does not go so far as to reflect an
output. In this merge pipeline example, we are taking the data that was filtered into a collection
in our previous multiple transform pipeline and merging those two datasets. This leaves us with a
single data set with names that start with A and B.

263
(Refer Slide Time: 05:05)

In this multiple-input pipeline example, we are doing joins from different data sources. The job
of Cloud Dataflow is to ingest data from one or more sources if necessary in parallel transform
that data and then load the data into one or more sinks. Google services can be used as both a
source and a sink.

(Refer Slide Time: 05:26)

264
In this simple but real example, the Cloud Dataflow pipeline reads data from a BigQuery table.
The source processes it in various ways, and the transforms write its output to Google Cloud
Storage, which is our sink. Some of the transforms in this example our Map operations and some
are Reduce operations. You can build expressive pipelines. Each step in the pipeline is elastically
scaled. So, there is no need to launch and manage your cluster instead. The service provides all
the resources on-demand.

It has automated and optimized work partitioning built-in, which can dynamically rebalance
lagging work that reduces the need to worry about hotkeys that are situations would disapprove
fortunately large chunks of your input get mapped to the same cluster.

(Refer Slide Time: 06:23)

We’ve discussed Cloud Dataproc and Cloud Dataflow as managed service solutions for
processing your Big Data. This flow chart summarizes what differentiates one from the other.
Both Cloud Dataproc and Cloud Dataflow can perform MapReduce operations. The biggest
difference between them is that Cloud Dataproc works similarly as Hadoop would work in the
physical infrastructure. You would still create a cluster of servers to perform your ETL jobs.

265
In the case of Cloud Dataflow, the process is serverless. You provide the Java or Python code
and leverage the Apache Beam SDK to perform ETL operations on batch and streaming data in a
serverless fashion.

266
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-73
BigQuery Googles Enterprise Data Warehouse

In this last topic, you learned about BigQuery. BigQueries are fully managed, petabyte-scale,
low-cost analytics data warehouse. BigQuery serverless, there is no infrastructure to manage, and
you do not need a database administrator. BigQuery is a powerful big data analytics platform
used by all types of organizations, from startups to Fortune 500 companies. A short animated
video follows that introduces BigQuery and how it helps to handle the complexity of today's
data.

(Refer Slide Time: 00:31)

The BigQuery service replaces the typical hardware setup for a traditional data warehouse that is
it serves as a collective home for all your analytical data inside of your organization.

267
(Refer Slide Time: 00:43)

Datasets are collections of tables, views, and now even machine learning models that can be
divided along business lines or a given analytical domain. Each dataset is tied to a GCP project.

(Refer Slide Time: 00:56)

268
A Data lake may contain files and Google Cloud Storage or Google Drive or transactional data
and Cloud BigTable. BigQuery can define a schema and issue queries directly against these
external data sources called federated queries.
(Refer Slide Time: 01:12)

Database tables and views function the same way in BigQuery as they do in a traditional data
warehouse allowing BigQuery to support queries written in a standard SQL dialect that's called
ANSI 2011 compliance.

(Refer Slide Time: 01:26)

269
Cloud Identity and Access Management is used to grant permission to perform specific actions
inside a BigQuery. This replaces the SQL Grant and Revoke statements that you might have seen
before to manage access permissions and traditional SQL databases. Traditional data warehouses
are hard to manage and operate. They were designed for a batch paradigm. Data analytics for
operational reporting needs the data in the data warehouse, which meant only to be used by a few
management folks for just reporting purposes.

BigQuery, by contrast, is a modern data warehouse that changes the conventional mode of data
warehousing. Let's look at some of those key comparisons between the traditional data
warehouse and what you get with BigQuery.

(Refer Slide Time: 02:12)

BigQuery provides a mechanism for the automated data transfer and powers applications that
your team already knows and uses. So everyone has access to data insights. You can create read-
only shared data sources that both internal and external users can query and then make those
query results accessible to anyone through user-friendly tools such as Google Sheets, Looker,
Tableau, Click, or Google Data Studio.

270
BigQuery lays the foundation for AI. It's possible to train TensorFlow and Google Cloud
machine learning models directly with datasets stored in BigQuery. And BigQuery ML can be
used to build and train machine learning models using just SQL. It is my favorite feature.
Another extended capability is BigQuery GIS, which allows organizations to analyze geographic
data in BigQuery, essential to many critical business decisions that revolve around location data.

BigQuery allows organizations to analyze business events in real-time by automatically ingesting


data and making it immediately available to query inside their data warehouse. This is supported
by the ability of BigQuery to ingest up to 100,000 rows of data per second as in this recording
and for petabytes of data to be queried at lightning-fast speeds. Due to our fully managed
serverless infrastructure and globally available Network, BigQuery eliminates the work
associated with provisioning and maintaining a traditional data warehousing infrastructure.

BigQuery simplifies data operations through the use of Identity and Access Management or IAM
to control user's access to your resources. By creating roles and groups and assigning
permissions for running those BigQuery jobs and queries in a project and providing automatic
data backup and replications.

(Refer Slide Time: 04:09)

271
BigQuery is a fully managed service, which means that the BigQuery engineering team here at
Google takes care of all the updates. The maintenance upgrades should not require downtime or
hinder system performance. This frees up real people hours for not having to worry about these
common maintenance tasks.

(Refer Slide Time: 04:26)

Users do not need a provisioned resource before using BigQuery, unlike many RDBMS systems,
BigQuery allocates storage and query resources dynamically based on your usage patterns.
Storage resources are allocated as users consume them and then deallocated as they remove data
or drop those tables. Query resources are allocated according to the query type and the
complexity of that SQL. Each query uses a number of what are called slots, their units of
computation that comprise a certain amount of CPU and RAM.

Users don't have to make a minimum usage commitment to you as BigQuery. The service
allocates and charges her resources based on their actual usage. By default, all BigQuery users
have access to 2,000 slots for query operations. They can also reserve a number of fixed slots for
their project if you want.

272
(Refer Slide Time: 05:21)

Well, there are situations where you can query data without loading it. For example, when using
a public or shared dataset, Stackdriver log files are those external data sources. For all other
situations, you must first load your data into BigQuery before running your queries. In most
cases, you load data into BigQuery native storage, and if you want to get data back out a
BigQuery, you can then export the data.

The gsutil tool is a Python application that lets you access cloud storage from the command line.
You can use gsutil to do a wide range of bucket and object management tasks, including
uploading downloading and then deleting those objects. The officially supported installation
update method for gsutil is to do so as part of the Google Cloud SDK. The BigQuery command-
line tool is another Python-based command-line tool, and it is also installed through the SDK.

The BQ command-line tool serves many functions within BigQuery. But for loading, it is good
for large data files scheduled, uploads, creating those tables, defining schemas, and loading data
with one single command. You can use the BigQuery web interface in the GCP console as a
visual way to complete various tasks, including loading and exporting data, as well as running

273
your queries. The BigQuery API allows a wide range of services such as Cloud Dataflow and
Cloud Dataproc, as we talked about earlier to load or extract data from BigQuery.

The BigQuery data transfer service for cloud storage allows you to schedule recurring data loads
from cloud storage to BigQuery. It also automates data movement from a range of software-as-a-
service applications to BigQuery on a scheduled and managed basis. The BigQuery data transfer
service is accessible through the GCP console, the BigQuery web UI, the BQ command-line tool,
or the BigQuery data transfer services API.

Another alternative to loading data is to stream the data one record at a time. Streaming is
typically used when you need the data to be immediately available, such as a fraud detection
system, or a monitoring system. While load jobs are free in BigQuery, there is a charge for
streaming data. Therefore, it is important to use streaming in situations where the benefits
outweigh the costs.

(Refer Slide Time: 07:52)

To take full advantage of BigQuery as an analytical engine, you should store your data inside a
BigQueries native storage. However, your specific use case might benefit from analyzing
external sources either by themselves or join together within BigQuery storage. As well as many

274
partner tools that are already integrated with BigQuery, Google Data Studio can be used to draw
analytics from BigQuery and build sophisticated interactive data visualizations and dashboards
for your teams.

275
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-74
Quiz

(Refer Slide Time: 00:04)

Now let’s test your knowledge with this short quiz. Which of the following is true concerning
BigQuery? What do you think? BigQuery is a fully managed service. You aren’t required to
build servers or managed storage.

276
(Refer Slide Time: 00:18)

Which manage service should you use if you want to lift and shift an existing Hadoop cluster
without having to rewrite your Spark code? What do you think? Cloud Dataproc is your best
option if you want to take your existing Hadoop cluster and build something similar in the
Cloud.

(Refer Slide Time: 00:40)

277
Now, which of the following services leverages the Apache Beam SDK to perform ETL
operations on both batch and streaming data? How about this one? If you said Cloud Dataflow,
that is exactly right. It is a serverless managed data service that can perform ETL operations on
both batch and streaming data using the Apache Beam SDK. Nice work.

278
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-75
Summary

(Refer Slide Time: 00:07)

That concludes the, you have data but what can you do with the module. Let me remind you of
what you have learned so far. Cloud Dataproc provides a fast, easy, cost-effective way to run
Apache Hadoop and Apache Spark, which are open source Big Data technologies that support
Big Data Operations. Cloud Dataproc use cases include helping with log processing, ad-hoc data
analysis, and even machine learning.

Cloud Dataflow uses the Apache Beam SDK to offer a simplified stream and batch dataflow
processing pipeline. You use Cloud Dataflow to build those data pipelines, monitor their
execution, and then transform and analyze that data. Remember the discussion on sources and
sinks. Cloud Dataflow templates enable the rapid deployment of standard job types. The
BigQuery service replaces the typical hardware setup for a traditional data warehouse, and again
it serves as a collective home for all of your analytical data within your organization.

279
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-76
Module introduction

(Refer Slide Time: 00:07)

Hi, welcome to let machines do the work. I am Evan. In this module, you learned about the ways
Google can help you make decisions with machine learning. So far, in this Google Cloud
Computing Foundations Course, you discussed what the Cloud is, the Google Cloud Platform
itself, and using GCP to build apps. You then explored storage options, the role of APIs, Cloud
security networking, automation and monitoring, and some of the managed services that Google
Offers to process your Big Data.

In this tenth and final module, you will explore the world of machine learning and the role of
machine learning and putting your data to work.

280
(Refer Slide Time: 00:41)

This module's objective is for you to explain what machine learning is, the terminology that is
used, and its value proposition for your business. The specific learning objectives to achieve this
include you will be able to discuss machine learning in the Cloud, explore building bespoke
machine learning models using AI platform on GCP. Leverage Cloud AutoML to create custom
machine learning models with no code and apply a range of pre-trained machine learning models
using Google's machine learning APIs.

(Refer Slide Time: 01:14)

281
This agenda shows the topics that make up this module. The module starts with a high-level
introduction to what machine learning is and how it takes place, followed by an introduction to
the GCP machine learning spectrum. You will then do a fun exercise. We see firsthand how a
neural network can learn to recognize doodling. Next, you will explore the complexities behind
developing bespoke machine learning models, but also how the AI platform, Google's managed
machine learning service, makes it easier to take machine learning projects from ideation to
production and deployment.

For the first lab, this module, you learn how to train a TensorFlow model both locally and on the
AI platform and then how to use your trained model for prediction. Next, you learn how Cloud
AutoML allows you to train high-quality custom machine learning models with minimal effort or
machine learning expertise. Then it will be followed by a lab well. You get the use of Cloud
AutoML vision to train a custom model using images.

In the last topic of this module, you will explore the range of pre-trained machine learning APIs
that Google is made available for common applications. This will be supported by the
completion of three hands-on labs where you use the Cloud natural language, Cloud speech, and
video intelligence APIs. The module will end with a short quiz and a high-level recap of the key
learning points from the module.

282
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-77
Introduction to ML

(Refer Slide Time: 00:07)

The world is filled with things that we're able to react to and understand how much thought. For
example, consider a stop sign that partially covered by snow. It is still a stop sign or a chair that
is five times bigger than usual, and it is still a place to sit. But for computers which do not benefit
from growing up and learning the nuances of these objects, the world is often and much more
messy and complicated.

For the first topic, to start with this video calls, making sense of a messy world or Google
engineers and researchers discussed how machine learning is beginning to make computers and
many of the things that we use in for such as Maps, search recommending videos, translations
and so on much better.

(Video Start Time: 00:47)


(Video End Time: 04:32)

283
(Refer Slide Time: 04:34)

Steve heard a lot about machine learning or ML. Let's start with a definition. What is ML
definition? Here is a definition I like to use. ML is a way to get predictive insights from data to
make repeated decisions off. You do this using algorithms that are relatively general and
applicable to a wide variety of data sets. Pick up a typical company and how they use their data
today. Perhaps they have a dashboard that business analysts and decision-makers view on a daily
basis or report that is read on a monthly basis. This is an example of a backward-looking use of
data.

Looking at historical data, to create reports and dashboards, people tend to mean when they talk
about BI or business intelligence. A lot of data analytics is backward-looking, nothing wrong
with that. Instead, we use ML or Machine Learning to generate forward-looking or predictive
insights. Of course, the point of looking at historical data might be to make those decisions.
Perhaps, the business analyst examines the data, and they suggest new policies or rules.

For example, that's possible to raise the price of a product in a certain region. Now that business
and he is making unpredicted insight, but is that scalable? Can a business analyst make such a
decision for every single product in every single region? And can they dynamically adjust the

284
price every second? Ah, here is where the computers get involved. In order to make decisions
around predictive insights repeatedly, you need ML.

I need a computer program to derive those insights for you. So ML is about making predictive
insights from data, many of the main time. It is about scaling up BI and decision-making. The
other part of the machine on a definition is around the use of standard algorithms. Am I uses
standard algorithms to solve when looking like seemingly different problems? Normally when
we think of computers, we think a program that does different things.

(Refer Slide Time: 06:39)

For example, the software used to file your taxes is very different from the software used to get
directions home when driving. Machine learning is a little different. You use the same software
under the hood. That's what we mean when we say ML uses standard algorithms, but you can
train that software to do very different things. You can train the software to estimate the amount
of taxes that you owe or train that to offer to estimate the amount of time it will take to get you
home.

This ML software, once trained on your specific use case, is called a model. So, you know the
model that can estimate your taxes. A model that can estimate the time to get you home. We use

285
the term model because it is an approximation. It is a model of reality. For example, we are
giving the computer lots of historical data on drive time to New York City. It will earn their
relationships in the data traffic patterns, seasonality, time of day impact to predict today's
commute time home.

(Refer Slide Time: 07:41)

Whatever the domain, ML modeling requires lots of training examples. We will train the model
to estimate tax by showing it many, many, many examples of prior year tax returns or train the
model to estimate trip duration by showing it many, many, many different journeys. So, the first
stage of ML is to train the ML model with lots of good examples.

286
(Refer Slide Time: 08:08)

An example consists of input in the correct answer for the input. That is called the label. In the
case of structured data with rows and columns of data and input, it can simply be a single row of
data. In unstructured data, like images, an input can be a single image, say like a cloud that you
want to classify: Is this a rain cloud, or is this not? Now, imagine you work for a Manufacturing
Company. You want to train a machine learning model to detect defects in these parts before
assembling into the final products for users.

Now, you can start by collecting a dataset of the images for these parts. Some of the parts to be
good, some of these parts can be fractured or broken up. And for each image, you assign the
corresponding label. That is the right answer broken or not broken part and then use this set of
examples as training data for your model.

287
(Refer Slide Time: 09:06)

After you train the model, you can then use it to predict the label of images that it has never seen
before. Learn from the past, predict for the future. Here, your input for the trained model is an
image of the park because the model has already been trained is correctly able to predict at this
part is in good condition know that the image here is different from the ones used in our training
examples. However, it still works because the ML model has generalized.

It has not memorized the training data those specific examples that you showed him and is
learned a more general idea of what a good-looking part with a good condition for that part looks
like. So, why do we say these algorithms are standard? Algorithms exist independently of your
case even though it detecting manufacturing defects in parts in terms of images and detecting
something like disease, leaves, and tree images or two very different use cases.

288
(Refer Slide Time: 10:03)

The same algorithm, an image classification network, works for both. Similarly, there are
standard algorithms for predicting the future value of a time series data set and transcribing
human speech to text.

(Refer Slide Time: 10:21)

289
ResNet is a standard algorithm for image classification. There is not crucial to understand how
an image classification algorithm works. Only that is the hour that you should use if you need to
classify images of Automotive Parts when you use the same algorithm on different data sets.

(Refer Slide Time: 10:37)

There are different features or inputs relative to the different use cases, and you can see them
represented visually here.
(Refer Slide Time: 10:48)

290
You might be asking yourself, Is it not the logic different? You cannot possibly use the same
rules for identifying defects and manufacturing that you do and identifying different types of
leaves. You are right. The logic is different, but ML does not use logical if-then rules. The image
classification network isn't like that set of rules; if this then that but a function that learns how to
distinguish between categories of images.

You know, we start with the same standard algorithm after training the trained model that
classifies leaves is different from the trained model that classifies manufacturing parts. Guess
what, you can actually reuse the same code for the other use cases focused on the same kind of
task. So in our example, we are identifying manufacturing defects with a higher level tax with
classifying images.

You can reuse the same code for another image classification problem, like finding examples of
your products in photos posted on social media. However, you still at the train it separately for
each use case.

(Refer Slide Time: 11:56)

The main thing to know is that your model will only be as good as your data for machine
learning. And more often than or not, you use a lot of data for machine learning. For example,

291
that we talked about you will need a large dataset of historical examples of both rejected parts
and parts in good condition in order to train a model to categorize the parts that are defective or
not.

The basic reason why ML models need high-quality training data is because they do not have
human general knowledge like we do. Data is the only thing that they have access to, to learn
from.

292
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-78
ML and GCP

(Refer Slide Time: 00:07)

In this topic, you will be introduced to the different options that exist in GCP when it comes to
leveraging machine learning. First, you will explore the relationship between machine learning,
artificial intelligence, and deep learning.

293
(Refer Slide Time: 00:15)

A very common question asked what is the difference between AI, Artificial Intelligence
Machine learning, and Deep learning? Well, one way to think about it is AI is the discipline like
something like physics. AI refers to machines that are capable of acting autonomously. Machines
that think AI has to do with the theory and methods to build machines can solve problems by
thinking and acting like humans.

Machine learning within there is a toolset like Newton's Laws of Mechanics. Just as you can use
Newton's laws to figure out how long it will take a ball and drop and when it falls off a cliff. You
can use machine learning to scale ably solve certain kinds of problems using data examples but
without the need for any custom code.

Deep learning is a type of machine learning that works even when the data consists of
unstructured data like images, speech, video, natural language, text, and so on. One kind of deep
learning is image classification. A machine can learn how to classify images into categories
when it is shown lots of different examples. The really cool thing about deep learning is that
often, in a complex problem, it can do better than a human error. The basic difference between

294
machine learning and other techniques in AI is that in machine learning machines learn. They do
not start out intelligent. They become intelligent.

(Refer Slide Time: 01:45)

Back to our example, let's say we have built a machine on a model to find badly manufactured
products, and then we want to remove them. Quality control is now pretty inexpensive, so what?
The business factor motivating us is that my business will save money. I could add quality
control throughout our entire manufacturing process. Instead of just doing quality control at the
end of the manufacturing line, we can now insert it everywhere and improve overall quality.

The opportunity is for organizations to take advantage of the ease of creating new models to
continue to transform their business. So, you know what ML is, and I hope you have started to
come up with some of those ideas related to ML.

295
(Refer Slide Time: 02:31)

Much hype around ML now is because the barriers to entry of buildings these models have fallen
dramatically. You do not have to be an astrophysicist to do machine learning. And this is because
of the convergence of a number of critical factors- the increasing availability of data, the
increasing maturity and sophistication of those ML algorithms for you to choose from, and the
increasing power in the availability of computing hardware and software through things like
Cloud Computing.

(Refer Slide Time: 02:58)

296
Let me show you an example. Imagine, we want to build that ML model to identify those disease
leaves to predict the health of the trees. Remember, we can do that using a standard algorithm for
image classification. You don't need to have a Ph.D. in image processing. You just need to know
which algorithm should you choose off the shelf but back to our ML model. Another critical
ingredient for ML is that data we need to collect lots of images of leaves.

Today you can do that pretty easily with the camera on your phone. Finally, we need the
hardware and the software to make that happen. And that is easier now than it has ever been in
the past. We can use the cloud to power our ML model so that we can do it cost-effectively
different options exist when it comes to leveraging machine learning.

(Refer Slide Time: 03:44)

Advanced users who want more control over the building and training of their ML models will
use tools that offer the levels of flexibility that they are looking for. This could involve
developing custom models through an ML library like TensorFlow that supported an AI
platform. This option works well for data scientists with the skills and the need to create a
custom TensorFlow model.

297
But increasingly, you do not have to do that Google makes the power machine learning available
to you even if you have limited knowledge of machine learning. You can use Cloud AutoML
like you are going to do in one of your labs to build on Google's machine learning capabilities to
create your own custom machine learning models tailored to your specific business needs.

And then, integrate those models into applications and websites all without writing a line of
TensorFlow code. Alternatively, Google has a range of pre-trained, meaning you do not need to
bring your own data machine learning models that are ready for immediate use within
applications in ways that the respective APIs are designed to support. Such pre-trained models
are excellent ways to replace user input with machine learning.

298
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-79
Building Bespoke ML models

In this topic, you will take a high-level look at the complexities behind developing Bespoke
machine learning models. But how also AI platform Google's managed machine learning service
makes it easier for machine learning developers, data scientists, and data engineers to take their
machine learning projects from ideation to production and deployment.

(Refer Slide Time: 00:22)

Earlier, you are introduced to the idea that leveraging machine learning can split into three areas.
In this topic, you look at the more complex but also the most adaptable option of those three.

299
(Refer Slide Time: 00:36)

As a starting point, let us talk a little bit about TensorFlow. TensorFlow is an open-source, high-
performance library for numerical computation, not just about machine learning, any numerical
computation. In fact, people have used TensorFlow for all kinds of GPU computing. For
example, you can use TensorFlow to solve partial differential equations. These are very useful in
domains of like fluid dynamics. As a numeric programming library, TensorFlow is appealing
because you can write your own computation code in a high-level language like Python and have
it be executed in a very fast way that matters at scale.

300
(Refer Slide Time: 01:11)

TensorFlow works because you create a Directed Acyclic Graph or a DAG to represent your
computation. Directed means it has a direction of flow. Acyclic means that it can't feed into
itself, meaning that it isn't a circle and Graph because it has those nodes and edges. In this
schematic, those nodes represent mathematical operations things like adding, subtracting and
multiplying, and also very more complex functions.

Here, for example, softmax, matrix multiplication, and so on are mathematical operations that
are part of this directed graph. Connecting the nodes are those edges the lines the input and the
output of those mathematical operations. The edges represent arrays of data. Essentially, the
result of computing the cross-entropy is one of those three inputs to this bias add operation, and
the output of the bias add operation is sent along to the matrix multiplication operation or
MatMul. The other input to the MatMul, you need two inputs to multiply matrix together. The
other input is the variable, or that is the weight.

301
(Refer Slide Time: 02:26)

So, where does the name TensorFlow come from? In math, a simple number like 3 or 5 is called
a scalar. A vector is a one-dimensional array of numbers. In physics, a vector I know everyone's
trying to remember it right now is something that has what magnitude and direction, but in
computer science, we as a vector to mean one dimensional or 1D arrays. A 2-dimensional array
is called a matrix. A 3-dimensional array well, we just call it a 3D Tensor.

So, a scalar, vector, matrix, 3D Tensor, 4D Tensor, and so on. A Tensor is, therefore, an n-
dimensional array of data. So, your data in TensorFlow are Tensors. They flow through the
graph. Hence TensorFlow.

302
(Refer Slide Time: 03:23)

You can build a DAG in Python, stored in a saved model, and restore it in a C++ program for
low latency predictions. You can use the same Python code and execute it both on CPUs and
GPUs. This provides language and hardware portability.

(Refer Slide Time: 03:45)

Like most software libraries, TensorFlow contains multiple abstraction layers. The lowest level
of abstraction is a layer that is implemented to target the different hardware platforms like
running ML on your mobile device. In the latest version of TensorFlow, you also get the
Accelerate Linear Algebra or XLA framework, a faster compiler for all the math that underpins

303
your ML models. Again very low-level stuff here, they should probably know exists but
probably wouldn't interface with directly.

On top of that, hardware and moving up to more abstraction is the execution engine for
TensorFlow written in C++ for highly efficient operations. You could write me an entire
TensorFlow function if you wanted to in C++ and register it as a TensorFlow operation.
Generally, data scientists will use the APIs, which are a bit more abstract and next on our list.
Next are the familiar front-end SDKs or software development kits where we can use C++,
Python, Go, Java, etc. to access your TensorFlow operations. I will be honest with you, though a
lot of my work in TensorFlow uses pre-built ML ingredients provided through the Keras and
dataset APIs. Keras is a high-level neural networks API written in Python, and it can run on top
of TensorFlow. Here is how friendly and abstract the Keras library is.

(Refer Slide Time: 05:13)

Remember the deep neural network before that found the dog that is hiding in the laundry basket
if you are going to use Kares creating and adding those layers into that DNN would simply be
model dog ad and how large you want to layer to be in units. There are some other concepts for
image models like Softmax and ReLU layers that you will pick up when working with these

304
different algorithms, but the code itself for building these layers is just like stacking block after
block.

(Refer Slide Time: 05:40)

Now, let's talk about data size. If you have a small dataset like one that just fits in memory pretty
much any ML framework will work or Python and so on. They have these statistical packages
that often in three or four lines of code, and it will just work. But these are in-memory data sets.
Once your data sets get larger, these packages would not work. You will need to split your data
into batches and then train. However, you are also needed to distribute your training over many,
many machines.

Now, sometimes people think that they can take a shortcode to keep the training simple by
getting a bigger and bigger and bigger single machine with lots of GPUs, but that's not the case.
Scaling out is the answer, not scaling up. Another common shortcut that people take is just to
sample their data. So, it is small enough to let the ML use on the hardware that they happen to
have. They are limiting the effectiveness of ML by not using all of the data for the machine on a
model form. Using all of that data and then devising a plan to collect ten times that they currently
have is often the difference between an ML that does not work an ML that appears magical.

305
(Refer Slide Time: 06:53)

Some of the other major improvements to ML happen with human insights come into the
problem. In ML, you bring human insights that's what your experts know about the use case of
the data set in the form of refinements to existing features or the addition of new features in a
process that is often called feature engineering. You will also need to pre-process your raw data,
scale it, and code it. And these two things and a large data set also need to be distributed and then
done in the Cloud for scale.

Now, that is just on the training side. Once you have completed and successfully trained your
model, you want to deploy it for production. At that point, the performance characteristic
changes instead of just thinking how long it takes to train on your training data set. You must
also think about how it is going to support the number of prediction queries per second or QPS
that you are going to need.

That requires your solution being able to scale the prediction code as necessary to support the
users who need to make those timely predictions. Now the type of questions that you will ask
yourself. Here, in the serving of your model are what if the underlying model changes you have
retrained it? What if the parameters used in the model need to change? What if the number of
inputs or the data changes? Do you really want to expose all this to your users? What if the client
isn't in the language that you used to train the model?

306
You can invoke the TensorFlow API from pretty much any programming language. You can use
Cloud server resources to automatically scale out to as many of QPS or queries per second that
you need for those predictions. But these are things that need to be managed, and that can be a
challenge if you need to rapidly scale out that solution.

Earlier, we mentioned feature engineering and how to build those pipelines to pre-process your
training data before training. This same pre-processing must also happen at prediction time. But
beyond is cleaning up the data, there are various ways that your trained model could be a bit
different from your prediction one. Using a standard, like an AI platform, helps minimize these
issues. Once you have rarely talked about your prediction, inputs will often be systematically
different from those you use to train your model. In subtle in hard to detect ways, maybe the
average of some column has shifted, or the variance has grown over time. This is not a
phenomenon that we call training, serving, skew, and detecting it requires continual data
collection and re-examination.

307
(Refer Slide Time: 09:36)

AI platforms simplify this for you, ensuring that the trained model is what you actually run. In
other words, it helps you handle that training serving skew. A platform will keep track of all
those pre-processing and feature engineering steps for you as well as allow you to version your
model over time. In training, the AI platform will help you distribute that pre-processing, train
your model multiple times iteratively, and deploy your train model to the Cloud for predictions.

Now when we talk about predictions, the ML model is accessible through something like a
simple REST API, and it includes all the pre-processing and feature transformation that you did.
So, the client code can simply supply the raw input variables and get back a prediction. An AI
platform can also distribute the model as needed to supply a high number of queries per second.
Those are the people that want to make predictions with your trained model. With ML, you need
both high-quality executions at training and prediction time.

If you have a great model, but it is super slow for those timely predictions, no one will use it.
While the computation of a TensorFlow model once is relatively cheap. The point of an ML
model is to make those predictions for lots of incoming requests.

308
(Refer Slide Time: 10:56)

Let's look at a diagram that provides a high-level overview of the stages in the ML workflow.
The blue fill boxes will indicate where the AI platform provides managed services and APIs for
your use. You must have access to a large data set of training data that includes the attribute that
lets the label the correct answer for machine learning, and that is what you want to infer or
predict for the future based on all the other data that you have. Your other inputs are called
features.

For example, assume that you want your model to predict the sale price of a house. Begin with a
large dataset, describing the characteristics of houses in a given area, including things like the
sale price of each house and when it was sold. After you have sourced your data, you have to
analyze it and understand the data and pre-processing to get it ready for machine learning. In this
pre-processing step, you transform valid clean data into the format that best suits your model's
needs.

TensorFlow has already had a lot of pre-processing libraries that you can use automatically with
the AI platform. In addition, consider other GCP services that we talked about like BigQuery,
Cloud Dataproc, Cloud Dataflow, Cloud Data Fusion, and Cloud Dataprep to help you with
those transformations. A lot of the work and machine learning is just getting that clean data
ready for machine learning. Then you can develop your model using established ML techniques

309
or by defining new operations and approaches. You can start learning how to create your model
by working through the documentation provided by TensorFlow, SciKit-Learn, and XGBoosts.

AI platform provides the services you need to train and then evaluate your model in the Cloud.
When training your model, you feed it that data or those input features for which you already
know the value of that target data attribute the historical right answer again that is called the
label. You then run the model to predict those target values for your training data so that your
model can adjust its settings to fit itself to the data better and predict the target value more
accurately. This is the whole learning part of machine learning.

Similarly, when evaluating your trained model, you feed it data that includes the target values.
You compare the results of your model predictions to the actual values for the evaluated data,
and then you use statistical techniques appropriate to your model to gauge its success. That is
how well it learned. You can then tune the model by changing the operations or the settings that
you use to control for training purposes. These are called hyper parameters, such as the number
of training steps to run in training. This technique of adjusting those model knobs is called hyper
parameter tuning.

AI platform provides tools to upload your trained ML model to the Cloud. So, then you can send
prediction requests to the model. To deploy your trained model on the AI platform, you must
first save your model using the tools provided by your machine learning framework. This
involves serializing the information that represents your trained model into a file then you can
then simply deploy for prediction in the Cloud.

On GCP, you would upload the saved model to a Google Cloud Storage Bucket and then create a
model resource on the AI platform, and you just specified that Cloud storage path to where your
saved model is located. AI platform provides the services you need to request predictions from
your model in the Cloud. There are two ways to get predictions from trained models- Online
Prediction, sometimes called HTTP prediction, and Batch Prediction. In both cases, you pass
input data to your Cloud-hosted machine learning model, and you get those inferences for each
data instance.

310
And you can monitor the predictions on an ongoing basis. AI platform provides API is to
examine all your running jobs. In addition, various GCP tools support your deployed model's
operation, such as the entire suite of tools within Stackdriver. AI platform provides various
interfaces from managing your model and model versions, including a REST API, the G-Cloud
AI platform command-line tool, and through the GCP console.

311
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-80
Cloud AutoML

(Refer Slide Time: 00:11)

In this topic, you learn how Cloud AutoML allows you to train high-quality custom machine
learning models with minimal effort or machine learning expertise. Cloud AutoML is a suite of
machine learning products that enables users with a limited machine learning experience to train
high-quality models specific to their business needs. Cloud AutoML leverages more than ten
years of proprietary Google research technology to help the users ML models achieve faster
performance and more accurate predictions.

312
(Refer Slide Time: 00:33)

To put Cloud AutoML into context, let's look at what it takes to solve an ML problem. To solve
an ML problem without the benefit of a managed service, it is up to you to wrangle your data,
code the model, and put together all the infrastructure to serve the predictions. This can be
prohibitively complex and very time-consuming.

(Refer Slide Time: 00:52)

Earlier, you saw how the AI platform lets developers and data scientists build and run superior
learning models in production. As shown here, there is a considerable reduction in the required
training and serving infrastructure, as well as the overall amount of model code. However, there

313
is still a requirement to provide extensive training data, and the process is still a time consuming
one.

(Refer Slide Time: 01:17)

What is immediately notable with Cloud AutoML is that there is no requirement on the user's
side to develop a model or provide training and serving infrastructure. In addition, far less
training data is required, and the results are achieved a lot faster.

(Refer Slide Time: 01:34)

314
The ability of Cloud AutoML to efficiently solve an ML problem is largely due to how it
simplifies these complex steps associated with custom ML model building. Two Cloud AutoML
products apply to what you can see- AutoML Vision and AutoML Video Intelligence.

(Refer Slide Time: 01:52)

With AutoML Vision, you can upload images and train custom image models through an easy-
to-use graphical interface. You can optimize your model for accuracy, latency, and size. AutoML
Vision Edge allows you to export your custom trained model to an application in the Cloud or an
array of devices at the edge. You can train models to classify images through labels that you
choose. Alternatively, Google's data labeling service allows you to use their team to help
annotate your images, or videos, or text.

Later, you will complete a lab where you use Cloud AutoML Vision to train a custom model to
recognize the different types of Clouds.

315
(Refer Slide Time: 02:37)

AutoML Video Intelligence makes it easy to train custom models to classify and track objects
within videos. It is ideal for projects that require custom entity labels to categorize content,
which is not covered by the pre-trained Video Intelligence API. Two Cloud AutoML products
apply to what you can hear, AutoML Natural Language and AutoML Translation.

(Refer Slide Time: 03:10)

With AutoML Natural Language, you can train custom ML models to classify, extract, and
detect sentiment. This allows you to identify entities within documents and then label them based

316
on your domain-specific keywords or phrases. The same applies to understand the overall
opinion, feeling, or attitude expressed in a block of text that's tuned to your domain-specific
sentiment scores.

(Refer Slide Time: 03:32)

AutoML Translation allows you to upload translated language pairs and then train a custom
model where translation queries return specific results for your domain and scale and adapt to
meet your needs.

(Refer Slide Time: 03:46)

317
AutoML Tables reduces the time it takes to go from raw data to top-quality production-ready
machine learning models from months to just a few days. There are many different use cases for
AutoML Tables. For example, if you are in retail, you can better predict customer demand. You
can preemptively fill gaps and maximize your revenue by optimizing product distribution,
promotions, and pricing. If you are in the insurance business, you could first see and optimize a
policyholders' portfolios risk and the return by zeroing in on the potential for large claims or the
likelihood of fraud.

In marketing, you can better understand your customer. For example, what're the average
customers' lifetime value? You can make the most of marketing spend using AutoML Tables to
estimate predicted purchasing value, volume, frequency, lead conversion probability, and churn
likelihood.

318
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-81
Googles Pre-trained ML APIs

In the previous topic, you learned how you could build custom ML models with minimal effort
or ML expertise by leveraging the suite of ML products offered through Cloud AutoML. When
using Cloud AutoML, you define a domain-specific label the training dataset used to create the
custom ML model you require if you do not need a domain-specific dataset. However, Google
Suite of pre-trained ML APIs might meet your needs. In this topic, you will explore some of
those APIs and apply them through a series of labs.

(Refer Slide Time: 00:35)

API is like the Vision API, Natural Language API, or Translation API, or are already trained for
common ML use cases like image classification. They save you time and the effort of building,
curating, and training a new dataset. So, you can jump ahead right to your predictions. For pre-
trained models, Google has already figured out a lot of those hard problems for you.

319
(Refer Slide Time: 01:03)

Let's explore some of these pre-trained Machine Learning APIs. Let's start with the Cloud
Vision API. There are three major components that all roll up into this RESTful API, and behind
the scenes, each of these are powered by many ML models and years of research. The first is
detecting what an image is and then classifying it. The Vision API picks out the dominant entity.
For example, a car or a cat within an image from a broad set of object categories. This allows
you to detect broad sets of objects within your images easily.

Facial detection can detect when a face appears in photos along with the associated facial
features such as the eyes, the nose, and the mouth placement and the likelihood of over eight
attributes like joy and sorrow. However, facial recognition isn't supported, and Google doesn't
store facial detection information on any Google server. You can use the API to easily build
metadata on your image catalog enabling new scenarios like image-based searches or
recommendations.

320
(Refer Slide Time: 02:09)

Next, our images with text like scanned documents or a sign out there on the row, the Vision API
uses optical character recognition or OCR to extract the text of a wide range of languages into a
selectable and searchable format.

(Refer Slide Time: 02:26)

321
Lastly is a bit of intuition from the web, and it uses the power of Google Image Search. Does the
image contain entities that we know like the Eiffel Tower or a famous person? And then
landmark detection allows you to identify a popular natural and man-made structures along with
the associated latitude and longitude of the landmark. And the logo detection allows you to
identify product logos within an image.

You can build metadata on your image catalog extract text, moderate offensive content, or enable
new marketing scenarios through image sentiment analysis. You can also analyze images
uploaded in the request or integrate with image storage on Google Cloud Storage.

(Refer Slide Time: 03:11)

There are two APIs that apply to speech. The Cloud Text-to-Speech API converts text into
human-like speech in more than 180 voices across more than 30 languages and variants. It
applies research in speech synthesis and Google's powerful neural networks to deliver high
fidelity audio with this API. You can create lifelike interactions with users that transform
customer service device interaction and other applications.

The Cloud Speech-to-Text API enables you to convert real-time streaming or pre-recorded audio
into text. The API recognizes 120 language invariants to support a global user base. You can
enable voice command and control, transcribe audio from call centers, and so on.

322
(Refer Slide Time: 04:04)

The Cloud Translation API provides a simple programmatic interface for translating an arbitrary
string into any supported language. The API is highly responsive. So, web sites and applications
can integrate with it for fast dynamic translation of source text from the source language to a
target language, for example, from French to English. Language detection is also available in
cases where the source language is unknown.

Let us look at a short video that shows how Bloomberg, a global leader in business and financial
data news and insight, applies the Cloud Translation API to reach all of their customers
regardless of language.

323
(Refer Slide Time: 04:44)

The Cloud Natural Language API offers a variety of natural language understanding
technologies. It can do syntax analysis, breaking down sentences into tokens, identify the nouns,
verbs, and adjectives, and other parts of speech. And then also figure out the relationships among
the words. It can also do entity recognition. In other words, it can parse text and flag mentions of
people, organizations, locations, events, products, and media. Sentiment analysis allows you to
understand customer opinions to find actionable product and UX insights.

324
(Refer Slide Time: 05:22)

The Cloud Video Intelligence API supports the annotation of common video formats and allows
you to use Google Video Analysis technology as part of your applications. This REST API
enables you to annotate videos, stored in Google Cloud Storage with video and one frame per
second contextual information. It helps you identify key entities that are the nouns that are within
your video and when they occur. You can also use it to make your content more accessible,
searchable, and discoverable.

325
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-82
Quiz

(Refer Slide Time: 00:04)

Now let’s test what you have learned so far with a short quiz. Which machine learning tool will
be the best option for somebody who has limited application development or data science skills?
If you said Cloud AutoML, yes, that is a great option for when you want to leverage machine
learning, and you are not an application developer or a data scientist.

326
(Refer Slide Time: 00:21)

Next, what Google machine learning API can be used to gain insight and meaning from
sentiment inside of text? If you said the Cloud Natural Language API. It is used to try to derive
that meaning and that sentiment from text.

(Refer Slide Time: 00:38)

Next up, which machine learning service can run TensorFlow at scale? What do you think? AI
platform allows you to run TensorFlow at scale by providing you that managed infrastructure.

327
Google Cloud Computing Foundation Course
Evan Jones
Technical Curriculum Developer
Google Cloud

Lecture-83
Summary

(Refer Slide Time: 00:07)

That concludes the let machines do the work module. Let's remind you of what you have just
learned. Machine learning is a way to use standard algorithms to derive those predictive insights
from data to make repeated decisions. The other part of machine learning the definition is around
those standard algorithms. ML uses these two entered algorithms to solve seemingly different
problems. Whatever the domain, ML model training requires examples.

An example consists of an input and a correct answer for that input that is called the label. After
you train an ML model, you can use it to predict the label of images that have never seen before.
When you use the same algorithm on different datasets, there are different features or inputs
relevant to different use cases, while the logic is different. ML does not use logical rules, for
example, the image classification network.

It is not a set of if-this-then-that rules, but a function that then learns how to distinguish between
categories of images. This allows you to reuse the same code for other use cases that have

328
focused on the same kind of tasks. For machine learning your models will only be as good as the
input data that you provide and more often than not you need a lot of training data for these
models and the basic reason why ML models need a lot of high-quality data is because they do
not have what we have which is human generalized knowledge that we have accumulated over
the years. Data is literally the only thing that they have access to and to learn from.

(Refer Slide Time: 01:42)

TensorFlow is an open-source, high-performance library for numerical computation. As a


numerical programming library, TensorFlow is very appealing because you can write your
computational code in a high-level language like Python and have it be executed in a very fast
way at runtime. AI platform provides the services that you need to train and evaluate your model
in the Cloud. AI platform provides you with the tools to upload you are trained in a model to the
Cloud and services that you need to request online and batch predictions from your model in the
Cloud. AI platform provides APIs to examine running jobs and various interfaces from managing
your model and your model versions over time.

329
(Refer Slide Time: 02:32)

Cloud AutoML is a suite of machine learning products that enable users with limited ML
expertise to train high-quality models specific to their business use cases. The products include
AutoML vision, video intelligence, natural language, translation, and AutoML tables. Lastly,
Google's pre-trained and machine learning API is saved you the time and effort of building and
curating and training a brand new dataset. So, you can jump right in and start making
predictions.

330
THIS BOOK IS NOT FOR SALE
NOR COMMERCIAL USE

PH : (044) 2257 5905/08 nptel.ac.in swayam.gov.in

You might also like