0% found this document useful (0 votes)
46 views

C-Cdlilt-B - CDL Ilt Deck - Module 2 (v1.1)

Uploaded by

Saeed Nashar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

C-Cdlilt-B - CDL Ilt Deck - Module 2 (v1.1)

Uploaded by

Saeed Nashar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

Proprietary + Confidential

Lessons

Module 2 01 The Value of Data

Innovating with Data and 02 Data Consolidation and Analytics

Google Cloud 03 Innovation with Machine Learning


Proprietary + Confidential

Lessons

Module 2 01 The Value of Data

Innovating with Data and 02 Data Consolidation and Analytics

Google Cloud 03 Innovation with Machine Learning


Proprietary + Confidential

What is data?

Data

Data is any information that


is useful to an organization.
Examples include: audio
or video recordings, images,
or even just ideas in
employees’ heads.
Proprietary + Confidential

Businesses now have access


to data like never before
Proprietary + Confidential

Businesses have access to new Discussion

information about their customers


Proprietary + Confidential

Large enterprises face challenges when


leveraging the value of data
Proprietary + Confidential

Google Cloud offers...

Economies of scale

Automation

Rapid elasticity

Data access
Proprietary + Confidential

Start using data in your digital transformation


by mapping it
Data map

A chart of all the data used in


end-to-end business processes.

Start Finish
Proprietary + Confidential

There are many different types of data

Data point

Each single data item.

Dataset

The same types of data points


aggregated into a single group
of data.
Proprietary + Confidential

Datasets can be further organized


into ‘buckets’

User data Corporate data Industry data


Proprietary + Confidential

Datasets can be further organized Discussion

into ‘buckets’

User data Corporate data Industry data

All data from customers


who use or purchase your
services and products.
Proprietary + Confidential

Datasets can be further organized Discussion

into ‘buckets’

User data Corporate data Industry data

Data about your company


such as sales patterns and
operations.
Proprietary + Confidential

Datasets can be further organized Discussion

into ‘buckets’

User data Corporate data Industry data

Data found outside of an


organization that everyone in
the sector can access.
Proprietary + Confidential

How can you make your data actionable?

Dataset Dataset
Proprietary + Confidential

Data is categorized into two types

Unstructured Structured
Proprietary + Confidential

What is structured data?

Structured data is highly organized. Examples include customer records


consisting of names, addresses, credit card numbers, and other quantitative
data. Structured data can be easily stored and managed in databases.
Proprietary + Confidential

What is unstructured data?

Unstructured data has no organization and tends to be qualitative. Examples


of unstructured data include word processing documents, audio files,
images, and videos.
Proprietary + Confidential

Historically, unstructured data has been


difficult to analyze
Proprietary + Confidential

The ethics of handling large volumes of data


Proprietary + Confidential

Ethics are particularly important


when working with AI and ML

AI

ML
Proprietary + Confidential

Exercise 10 min Class Page 8

Take 2 mins to play with the intersections 1. What insight could I gain if these datasets
between these datasets. Consider two or more were combined?
datasets and ask yourself: 2. How can I explore this data further and turn it
into actionable insights?

User data Corporate data Industry data


Proprietary + Confidential

Lessons

Module 2 01 The Value of Data

Innovating with Data and 02 Data Consolidation and Analytics

Google Cloud 03 Innovation with Machine Learning


Proprietary + Confidential

Data is central to digital transformation


Proprietary + Confidential

The impact of data-driven decision making

69% of companies report that


they have not created a
data-driven organization.
Proprietary + Confidential

Where is your company or customer Discussion

data currently stored?


Proprietary + Confidential

When data is stored on premises, you are


responsible for the IT infrastructure
Proprietary + Confidential

With cloud, you can ‘rent’ space from


public cloud providers
Proprietary + Confidential

Data is no longer just retrospective


Proprietary + Confidential

Data management touches on a lot of


different areas, from storage to analytics
Proprietary + Confidential

Data management solutions: Discussion

databases
Database

An organized collection of data,


generally stored in tables and
accessed electronically from
a computer system.
Proprietary + Confidential

Google Cloud database solutions: Cloud SQL

Cloud SQL
(Structured Query
Language)

A fully managed relational


database management service,
or RDBMS. It easily integrates
Cloud SQL with existing applications and
Google Cloud services like
Google Kubernetes Engine and
BigQuery.
Proprietary + Confidential

Google Cloud data management solutions:


Cloud Spanner
Cloud Spanner

A fully managed database


service, designed for global
scale. Data is automatically and
instantly copied across regions.
This replication means that if
one region goes offline, the
organization’s data can still be
retrieved from another region.
Proprietary + Confidential

Google Cloud data management solutions:


Cloud Spanner
Proprietary + Confidential

Databases vs. data warehouses

Database Data warehouse


Databases store transactional data in an Data warehouses assemble data from
online fashion. multiple sources including databases.
Proprietary + Confidential

Databases vs. data warehouses

Database Data warehouse


Databases store transactional data in an Data warehouses assemble data from
online tabulated fashion. multiple sources including databases.

Databases are built and optimized to Data warehouses are built to rapidly analyse
ingest large amounts of data from many and report massive and multi-dimensional
different sources efficiently. datasets on an ongoing basis, in real-time.
Proprietary + Confidential

If you want to analyze the data you capture with


a database, you need a data warehouse
Proprietary + Confidential

Cloud data warehouses consolidate


data that is structured and
semi-structured
Proprietary + Confidential

Data warehouses transform and


analyze unstructured data
Proprietary + Confidential

Google Cloud data warehouse solutions: BigQuery

BigQuery

A fully managed data


warehouse, with downtime-free
upgrades and maintenance, and
seamless scaling. BigQuery
allows you to analyze petabytes
of data using incredibly fast
speeds and zero operational
overhead.
Proprietary + Confidential

Most data warehouse providers link


storage and compute together
Proprietary + Confidential

BigQuery is serverless
Proprietary + Confidential

BigQuery example: Ocado

80x faster 30% less cost


Proprietary + Confidential

Additional tools: Pub/Sub and DataFlow

Pub/Sub Dataflow

A service for real-time A service for large-scale


data ingestion. data processing.
Proprietary + Confidential

Pub/Sub and DataFlow bring


unstructured data into the
cloud and transform it into
semi-structured data
Proprietary + Confidential

Data management solutions: Discussion

data lakes
Data lake

A data lake is a repository


designed to store, process, and
secure large amounts of
structured, semistructured, and
unstructured data. It can store
data in its native format and
process any variety of it, ignoring
size limits.
Proprietary + Confidential

Data lakes are made up of many


different products

Structured Semi-Structured Unstructured

Cloud Cloud Cloud Cloud Cloud


BigQuery
SQL Spanner Datastore Bigtable storage
Proprietary + Confidential

It’s challenging to identify the right


business intelligence solution
Proprietary + Confidential

Google Cloud business intelligence solutions: Looker

Looker

A data platform that sits on


top of any analytics database
and makes it simple to
describe your data and define
business metrics.
Proprietary + Confidential

Google Cloud business intelligence solutions: Looker


Proprietary + Confidential

Exercise
10 min Class Page 9

Practice identifying the best data management


solution (databases, data warehouses, and data
lakes) for various use cases.
Proprietary + Confidential

Example 1
A coworking office rental business uses an online tool to record
daily desk, room, and meeting bookings. If a client books a desk
for the day, that data is captured and desk availability is updated
in real time on all customer channels. The rental business now
want to do even more with their data. They want to use multiple
types and sources of data to gain insights about facility quality
and, ultimately, to improve their service to customers.

Which data management solution best fits the coworking


business’ needs?
Proprietary + Confidential

Example 1
A coworking office rental business uses an online tool to record
daily desk, room, and meeting bookings. If a client books a desk
for the day, that data is captured and desk availability is updated
in real time on all customer channels. The rental business now
want to do even more with their data. They want to use multiple
types and sources of data to gain insights about facility quality
and, ultimately, to improve their service to customers.

Answer: Data warehouse


Proprietary + Confidential

Example 2
A bank is launching a mobile banking app, and wants to track
money transfers from one account to another. They want to
make sure the transferred figure is updated in the bank’s records
in real time and the user is able to see the most up-to-date
account balance.

Which data management solution best fits this bank’s needs?


Proprietary + Confidential

Example 2
A bank is launching a mobile banking app, and wants to track
money transfers from one account to another. They want to
make sure the transferred figure is updated in the bank’s records
in real time and the user is able to see the most up-to-date
account balance.

Answer: Database
Proprietary + Confidential

Example 3
An online music streaming company stores raw music data that
is accessed by users worldwide and constantly analyzed by their
systems. They want to geographically disperse backup copies of
their raw data in very large volumes. This data comes in a
variety of formats, must retain full fidelity, and be accessible for
processing and analysis at any time, at short notice.

Which data management solution best fits this streaming


service’s needs?
Proprietary + Confidential

Example 3
An online music streaming company stores raw music data that
is accessed by users worldwide and constantly analyzed by their
systems. They want to geographically disperse backup copies of
their raw data in very large volumes. This data comes in a
variety of formats, must retain full fidelity, and be accessible for
processing and analysis at any time, at short notice.

Answer: Data lake


Proprietary + Confidential

Example 4
A lifestyle company is launching a casual dating mobile app. By
signing onto the app through social media, users provide details
such as gender, location, and interests, as well as headshot
images. The lifestyle company wants to display this information
to other app users through an algorithm, which depends on
compatibility, and needs a cost-effective data management
solution that can hold large volumes of data. They also can’t
afford downtime that would drive users away.

Which data management solution best fits this company’s


needs?
Proprietary + Confidential

Example 4
A lifestyle company is launching a casual dating mobile app. By
signing onto the app through social media, users provide details
such as gender, location, and interests, as well as headshot
images. The lifestyle company wants to display this information
to other app users through an algorithm, which depends on
compatibility, and needs a cost-effective data management
solution that can hold large volumes of data. They also can’t
afford downtime that would drive users away.

Answer: Database
Proprietary + Confidential

Afternoon break
Please return in 15 minutes
Proprietary + Confidential

Lessons

Module 2 01 The Value of Data

Innovating with Data and 02 Data Consolidation and Analytics

Google Cloud 03 Innovation with Machine Learning


Proprietary + Confidential

To understand ML, start by thinking


about your business data
Proprietary + Confidential

What is machine learning? Discussion


Proprietary + Confidential

What is machine learning?

Artificial Intelligence (AI) AI Machine Learning (ML)

A broad field or term that A branch within the field of AI.


describes any kind of machine Computers that can "learn" from
capable of a task that normally data and make predictions or
requires human intelligence, such decisions without being explicitly
as visual perception, speech programmed to do so, using
recognition, decision-making, or ML algorithms or models to analyze
translation between languages. data. These algorithms use
historical data as input to predict
new output values.
Proprietary + Confidential

Machine learning solves many kinds of problems

Classification Regression Clustering


Proprietary + Confidential

Can you think of any examples of ML Discussion

that you come across in your daily life?


Proprietary + Confidential

The accuracy of ML predictions relies on


data that is correct and free of errors

Machine Learning (ML)

Data is used by ML models to


derive predictive insights and
make repeated decisions.
Proprietary + Confidential

The best data has three qualities:

01 It has coverage

02 It’s clean or consistent

03 It’s complete
Proprietary + Confidential

The best data has three qualities:

01 It has coverage Data coverage

The scope of a problem domain


02 It’s clean or consistent and all possible scenarios it can
account for — all possible input
and output data.
03 It’s complete
Proprietary + Confidential

The best data has three qualities:

01 It has coverage Data cleanliness

Sometimes called data


02 It’s clean or consistent consistency. Data is considered
dirty or inconsistent if it includes
or excludes anything that might
03 It’s complete
prevent an ML model from
making accurate predictions.
Proprietary + Confidential

The best data has three qualities:

01 It has coverage Data completeness

This refers to whether there is


02 It’s clean or consistent sufficient data available to deliver
meaningful inferences and
decisions.
03 It’s complete
Proprietary + Confidential

Data is the only tunnel through which


your model views the world
Proprietary + Confidential

Google Cloud provides tools


to support an entire ML workflow
Proprietary + Confidential

Identify the best approach for your ML project

Do you have your own Are you writing the model


training data? code yourself?

No Yes No Yes

Vertex AI

Custom model
Pre-trained APIs AutoML
tooling
Proprietary + Confidential

APIs allow developers to quickly and Pre-trained APIs

easily train custom models

APIs
Proprietary + Confidential

Google Cloud ML tools: AI Hub Pre-trained APIs

AI Hub

A hosted repository of
plug-and-play AI components.
Proprietary + Confidential

Google Cloud ML tools: Vision API Pre-trained APIs

Vision API

Powerful pre-trained machine


learning models, which use
Google’s data, to automatically
detect faces, objects, text and
even sentiment in images.
Proprietary + Confidential

Google Cloud ML tools: Pre-trained APIs

Natural Language API


Natural Language API
Z
Y
X An API that discovers syntax,
X X X
X entities, and sentiment in text
Y using Google data, and classifies
Y Y Y
Z text into a predefined set of
Z categories.
Z Z Z
X
Y
Proprietary + Confidential

Train models using your own AutoML

data with Vertex AI


Vertex AI

A unified managed platform


for Google Cloud ML building
Existing services. You can either train an
existing ML model with your data,
Custom or build a custom ML model
which you train with your data.
Proprietary + Confidential

Google Cloud tools: AutoML Vision API AutoML

AutoML Vision API

An API that automates the


training of your own machine
learning models. A developer can
simply upload a custom batch of
images and train an image
classification model.
Proprietary + Confidential

Google Cloud tools: AutoML AutoML

Natural Language
AutoML Natural
Language

Enables you to build and deploy


custom machine learning models
that analyze documents,
categorize them, identify entities
within them, or assess attitudes
within them.
Proprietary + Confidential

A huge range of Google Cloud solutions AutoML

with Vertex AI
Proprietary + Confidential

Vertex AI: the essential platform for Custom model tooling

creating custom end-to-end AI models

Deploy and monitor models

Build models

Feature engineering

Gather data
Proprietary + Confidential

Vertex AI: tools for data labeling, training, Custom model tooling

predictions, and virtual machine imaging


Proprietary + Confidential

Google Cloud AI solutions

Document AI

Contact Center
AI

Cloud Talent Solution


Proprietary + Confidential

Google Cloud AI foundational


infrastructure: TensorFlow
TensorFlow

An end-to-end open source


platform for machine learning,
with a comprehensive, flexible
ecosystem of tools, libraries and
community resources.
Proprietary + Confidential

Standardize your ML process and make it more reliable

MLOps

An ML engineering culture and


practice that aims to unify ML
system development (Dev) and
ML system operation (Ops).
This means advocating for
automation and monitoring at all
steps of ML system construction,
ML Dev ML Ops including integration, testing,
releasing, deployment and
infrastructure management.
Strong data is the foundation of ML

Data analytics: MLOps: deploy ML models


data-driven decisions 3 5 to launch more intelligent
products and features

Data science (ML): rapid


Build data lake
and/or data warehouse 2 4 prototyping and
reproducible experiments

IT infrastructure,
applications, and 1
data management
Proprietary + Confidential

Exam practice questions


10 min Class Page 23

Let’s practice answering 3 sample CDL


certification exam questions.

1. Take 5 mins to attempt the questions.


2. We’ll spend 5 mins going through them
together.
Proprietary + Confidential

Question 1
Images and videos are examples of what type of
data?
A. Unstructured
B. Structured
C. Semi-structured
D. Organized
Proprietary + Confidential

Question 1
Answer: A) Unstructured
Why? Images and videos are not arranged
according to a pre-set data model or schema.
Since they have no organization and tend to be
qualitative they are seen as unstructured data.
Proprietary + Confidential

Question 2
What is a data lake?
A. A database for storing of structured and
unstructured data
B. A large pool of data accessible to
database administrators only
C. A repository of data from various sources
stored in its native format for processing
D. A refined data repository accessible by
employees and select customers
Proprietary + Confidential

Question 2
Answer: C) A repository of data from
various sources stored in its native
format for processing
Why? The data in a data lake can come from
different sources, but is stored in its raw, native
format and not transformed when ingested
Proprietary + Confidential

Question 3
What is a common business problem that machine
learning solves?

A. Creating personalized customer experiences


B. Identifying competitor differentiation
C. Automate an inefficient internal process
D. Leveraging sales figures to identify trends
Proprietary + Confidential

Question 3
Answer: A) Creating personalized
customer experiences

Why? ML is best used for problems that need


predictions based on repeated decisions rather
than statistical analysis based on historical data.
Personalization is an example of problems ML is
good at solving.
Proprietary + Confidential

End of day 1 training


See you tomorrow!

You might also like