MLGCP
MLGCP
Machine Learning
Engineer
Study Guide
v2.0
2
Contents
Contents 2
Introduction 3
Learn more about the exam 4
Outline of learning path content 5
Exam knowledge areas 10
Section 1: Architecting low-code ML solutions 10
Section 2: Collaborating within and across teams to manage data and models 13
Section 3: Scaling prototypes into ML models 16
Section 4: Serving and scaling models 19
Section 5: Automating and orchestrating ML pipelines 21
Section 6: Monitoring ML solutions 24
Glossary 26
List of Google products and solutions 36
3
Introduction
The Google Cloud Professional Machine Learning Engineer training and exam are intended for
individuals with a strong technical background and hands-on experience in machine learning (ML)
who want to demonstrate their expertise in designing, building, and deploying machine learning
solutions on Google Cloud.
The exam validates a candidate’s ability to complete the following course objectives:
● Describe how to develop and implement machine learning solutions using low-code tools and
services on Google Cloud.
● Explain how to effectively manage data, prototype models, and collaborate within and across
teams to build robust ML solutions.
● Describe how to deploy and scale ML models in production using various serving strategies
and infrastructure on Google Cloud.
● Identify the key tasks and considerations for monitoring, testing, and troubleshooting ML
solutions to ensure performance, reliability, and responsible AI practices.
4
The Professional Machine Learning Engineer Certification exam assesses the knowledge and skills
required to effectively leverage Google Cloud's machine learning capabilities to solve real-world
problems.
The Professional Machine Learning Engineer exam assesses knowledge in six areas:
● Monitoring AI solutions.
To get started, you can explore the updated Machine Learning Crash Course, which provides a
foundation in machine learning fundamentals and offers guided, hands-on coding experience
through Codelabs.
For a comprehensive learning journey towards the Professional Machine Learning Engineer
certification, sign up for the Professional Machine Learning Engineer Learning Path through Google
Cloud Skills Boost, Coursera, or Pluralsight.
Learn more about how and where to take the exam on the Professional Machine Learning Engineer
website.
5
To provide a concise overview of the core curriculum, introductory and summary modules that
typically appear at the beginning and end of these courses have been excluded from this list. These
modules generally recap the course scope and content.
Lab 10: Prepare Data for ML APIs on Google Cloud: Challenge Lab
Mini-course: 8 lessons
Lab 2: ETL Processing on Google Cloud Using Dataflow and BigQuery (Python)
Lab 4: Engineer Data for Predictive Modeling with BigQuery ML: Challenge Lab
06 Feature Engineering
Module 3: Building Neural Networks with the TensorFlow and Keras API
11 Introduction to Generative AI
Mini-course: 1 lesson
Module 5: Continuous Training with Multiple SDKs, KubeFlow & AI Platform Pipelines
Predictions
Lab 5: Build and Deploy Machine Learning Solutions with Vertex AI: Challenge Lab
Module 2: Prompts
Module 1: AI Privacy
Module 2: AI Safety
10
● Preparing data for AutoML (e.g., feature Introduction to AI and Machine Learning on
selection, data labeling, Tabular Workflows Google Cloud
on AutoML).
Working with Notebooks in Vertex AI
● Using available data (e.g., tabular, text,
speech, images, videos) to train custom
models.
● Data preprocessing (e.g., Dataflow, Engineer Data for Predictive Modeling with
TensorFlow Extended [TFX], BigQuery). BigQuery ML
● Choosing the appropriate Google Cloud Build and Deploy Machine Learning Solutions
environment for development and on Vertex AI (Lab 1)
experimentation (e.g., Vertex AI
Machine Learning Operations (MLOps) with
Experiments, Kubeflow Pipelines, Vertex AI
Vertex AI: Model Evaluation (Module 2)
TensorBoard with TensorFlow and
PyTorch) given the framework.
● Organizing training data (e.g., tabular, text, Introduction to AI and Machine Learning on
speech, images, videos) on Google Cloud Google Cloud (Module 4)
(e.g., Cloud Storage, BigQuery).
Introduction to Large Language Models
● Ingestion of various file types (e.g., CSV,
Machine Learning Operations (MLOps) for
JSON, images, Hadoop, databases) into
Generative AI
training.
Production Machine Learning Systems
● Training using different SDKs (e.g., Vertex
(Module 3, Module 6)
AI custom training, Kubeflow on Google
Kubernetes Engine, AutoML, tabular Build and Deploy Machine Learning Solutions
workflows). on Vertex AI (Lab 3)
● Using distributed training to organize
reliable pipelines.
● Hyperparameter tuning.
● Batch and online inference (e.g., Vertex AI, Feature Engineering (Module 1)
Dataflow, BigQuery ML, Dataproc).
Production Machine Learning Systems
● Using different frameworks (e.g., PyTorch, (Module 2)
XGBoost) to serve models.
Build and Deploy Machine Learning Solutions
● Organizing a model registry. on Vertex AI (Lab 3)
● Vertex AI public and private endpoints. Machine Learning Operations (MLOps) with
Vertex AI: Manage Features
● Choosing appropriate hardware (e.g., CPU,
GPU, TPU, edge). Production Machine Learning Systems
(Module 2)
● Scaling the serving backend based on the
throughput (e.g., Vertex AI Prediction, Build and Deploy Machine Learning Solutions
containerized serving). on Vertex AI (Labs 1-3)
● Tuning ML models for training and serving Create ML Models with BigQuery ML (Lab 2,
in production (e.g., simplification Lab 3)
techniques, optimizing the ML solution for
increased performance, latency, memory,
throughput).
21
● Establishing continuous evaluation metrics Build and Deploy Machine Learning Solutions
(e.g., Vertex AI Model Monitoring, on Vertex AI (Lab 1, Lab 3, Lab 5)
Explainable AI).
Production Machine Learning Systems
● Monitoring for training-serving skew. (Module 2, Module 6)
Glossary *
* This glossary provides a brief overview of key terms used in each section. For a comprehensive list of machine learning
terminology with detailed explanations and examples, refer to the Google Machine Learning Glossary at
developers.google.com/machine-learning/glossary.
ML (Machine Learning): A subset of AI that allows computers to learn without being explicitly
programmed, in contrast to traditional programming, where the computer is told explicitly what to
do.
AutoML: Automated machine learning. AutoML aims to automate the process to develop and deploy
an ML model.
BigQuery ML (BQML): BigQuery Machine Learning, allows users to use SQL (or Structured Query
Language) to implement the model training and serving phases.
Classification model: A type of machine learning model that predicts a category from a fixed
number of categories.
Custom Training: A code-based solution for building ML models that allows the user to code their
own ML environment, giving them flexibility and control over the ML pipeline.
Deep Learning packages: A suite of preinstalled packages that include support for the TensorFlow
and PyTorch frameworks.
Transfer learning: A technique where a pre-trained model is adapted for a new, related task.
Hyperparameter: A parameter whose value is set before the learning process begins.
Large language models (LLM): General-purpose language models that can be pre-trained and
fine-tuned for specific purposes.
27
Neural Architecture Search: A technique for automating the design of artificial neural networks
(ANNs).
Deep Learning: A subset of machine learning that adds layers in between input data and output
results to make a machine learn at much depth.
Generative AI: Produces content and performs tasks based on requests. Generative AI relies on
training extensive models like large language models, which are a type of deep learning model.
Foundation model: A large AI model trained on a massive dataset, capable of performing a wide
range of tasks and serving as a basis for fine-tuning for specific applications.
Prompt design: The process of crafting effective prompts to elicit desired responses from
generative AI models.
Supervised learning: Uses labeled data to predict outcomes. Includes classification (predicting
categories) and regression (predicting numbers).
Unsupervised learning: Uses unlabeled data to find patterns. Includes clustering (grouping data),
association (finding relationships), and dimensionality reduction (simplifying data).
MLOps (Machine Learning Operations): Turns ML experiments into production and helps deploy,
monitor, and manage ML models.
Responsible AI: The development and use of AI in a way that prioritizes ethical considerations,
fairness, accountability, safety, and transparency.
28
Section 2
Accuracy: A metric for evaluating classification models that measures the proportion of correct
predictions out of the total number of predictions.
Arbiter models: Specialized language models used in model-based evaluation to mimic human
evaluation and compare the quality of responses from different models.
Bias and fairness: Challenges in model evaluation related to inherent biases in data that can lead to
discriminatory outcomes.
Binary evaluation: A simple evaluation type that involves a yes/no or pass/fail judgment.
Categorical evaluation: An evaluation type that offers more than two options for slightly more
nuanced feedback.
Continuous evaluation: The ongoing process of evaluating a deployed model's performance using
new data to identify any potential decline in effectiveness.
Customization: The ability to tailor workspaces to specific workflows, preferences, and project
requirements.
Diversity metrics: Metrics that focus on measuring the variety and range of outputs a model can
generate.
Evaluation metrics: Numerical scores used to quantify a model's performance on specific tasks.
GPUs: Graphical processing units that accelerate tasks like model training and inference, data
analytics, and big data processing workloads by efficiently processing large datasets and performing
parallel computations.
Ground truth dataset: A dataset containing the "correct" answers or labels, typically used as a
reference for evaluating model performance.
Jupyter Notebooks: Interactive computing environments that allow users to create and share
documents that contain live code, equations, visualizations, and narrative text.
Model evaluation: The process of assessing how well a machine learning model performs its
intended task.
29
Multi-task evaluation: An evaluation type that combines multiple judgment types, such as numerical
metrics and human ratings, for comprehensive evaluation.
Numerical evaluation: An evaluation type that assigns a quantitative score to model outputs.
Notebook: A document that combines executable code and rich text in a single document, used for
editing and executing code.
Persistent disk: A type of storage that retains data even after a virtual machine (VM) instance is shut
down or terminated. Persistent disks are typically used to store critical data that needs to be
preserved, such as operating system files, application data, and user files.
Principals: Users, groups, domains, or service accounts that can be granted access to a resource in
Colab Enterprise.
Prompt engineering: The process of designing and refining prompts or instructions given to LLMs to
improve their performance and control their output.
Roles: Sets of permissions that determine what actions principals can take on a resource in Colab
Enterprise.
Text evaluation: An evaluation type that uses human-generated feedback in the form of comments,
critiques, or ratings to assess model output quality.
Vertex AI Pipelines: Automates the deployment process and ensures consistency between training
and production environments.
30
Section 3
AI Foundations: Building blocks for AI solutions, including cloud essentials, data tools, and AI
development processes.
Dialog tuning: A specialized form of instruction tuning where language models are trained to engage
in conversations by predicting the most appropriate response in a given context. Dialog-tuned models
excel in handling back-and-forth interactions and are well-suited for chatbot applications.
Fine tuning: Adapting a pre-trained foundation model to a specific task or domain using a smaller
dataset.
Foundation models: Large-scale, pre-trained AI models that can be adapted for various tasks,
serving as a base for building specific applications.
Hybrid machine learning models: Models that work well both on-premises and in the cloud.
Inference: The process of using a trained machine learning model to make predictions on new data.
Instruction tuning: A training approach where language models are trained to follow instructions and
generate responses based on the given input. Instruction-tuned models are more adaptable to
various tasks and can perform well even with limited training data.
Model Garden: A repository of generative AI models, both from Google and open-source, that
developers can access and utilize.
Multimodal model: A model capable of processing and generating content in multiple modalities,
such as text, images, and video.
Predictive AI: AI models that focus on predicting future outcomes based on patterns in data.
Section 4
Batch serving: Serving features for high throughput and serving large volumes of data for offline
processing.
Container: An abstraction that packages applications and libraries together so that the applications
can run on a greater variety of hardware and operating systems.
Drift: The change in an entity relative to a baseline. In the case of production ML models, this is the
change between the real-time production data and a baseline data set, likely the training set, that is
representative of the task the model is intended to perform.
Entity type: A group of features that are related to each other in some way.
Extrapolation: When models are asked to make predictions on points in feature space that are far
away from the training data.
Feature ingestion: The process of importing feature values computed by your feature engineering
jobs into a feature store.
Feature serving: The process of exporting stored feature values for training or inference.
Feature Store: A centralized repository for organizing, storing, and serving machine learning features.
Feature value: Feature Store captures feature values for a feature at a specific point in time.
Model Registry: A tool for registering, organizing, tracking, and versioning trained and deployed ML
models.
Modular design: Programs are more maintainable, as well as easier to reuse, test, and fix because
they allow engineers to focus on small pieces of code rather than the entire program.
Monolithic design: A software program that is not modular. Few software programs use this design.
Online serving: Low-latency data retrieval of small batches of data for real-time processing, like for
online predictions.
32
Section 5
AI Platform Pipelines: A service that makes it easy to schedule Kubeflow pipeline runs, either one-off
or recurring.
AI Platform Training: A service that allows users to submit training images directly to benefit from
distributed training on the cloud.
Cloud ML Engine: A serverless execution environment that allows data scientists to not worry about
infrastructure.
Composability: The ability to compose a bunch of microservices together, and the option to use
what makes sense for your problem.
Continuous training: The practice of periodically retraining machine learning models to maintain their
performance as data distributions change.
Custom job: A way to run your training code on Vertex AI. When creating a custom job, you specify
settings such as the location, job name, Python package URIs, and worker pool specifications.
Custom model: A type of machine learning model that is created and trained by the user to meet
specific needs.
Federated Learning: A new frontier in machine learning that continuously trains the model on device,
and then combines model updates from a federation of users’ devices to update the overall model.
Inference-on-the-Edge: Doing predictions on the edge, on the device itself, due to connectivity
constraints.
Kubeflow: An open-source platform for building and deploying machine learning workflows, including
pipelines.
Kubeflow Pipelines: A series of steps or operations in a machine learning workflow, often including
data preprocessing, model training, and model deployment.
ML Metadata: Data about machine learning models and datasets, such as model architecture, training
data, and evaluation metrics.
model.py: Contains the core machine learning logic for your training job, invoked by task.py.
Parameter-efficient tuning: A method for fine-tuning large language models that involves training
only a subset of the model's parameters, reducing computational costs and resources.
33
Pre-built ML APIs: Pre-made machine learning models that are ready to use for common tasks, such
as image recognition, natural language processing, and translation.
Quantization: A technique that compresses each float value to an 8-bit integer, reducing the size of
the files and the resources required to handle the data.
task.py: The entry point to your code when sending training jobs to Vertex AI. It handles job-level
details like parsing command-line arguments, run time, output locations, and hyperparameter tuning.
Vertex AI Model Monitoring: A service for monitoring the performance of machine learning models
in production.
34
Section 6
Ablation analysis: Comparing a model's performance with and without a specific feature to assess its
value.
Adversarial testing: Evaluating AI models with malicious or harmful input to identify vulnerabilities.
AI safety: Building AI systems that are safe, secure, and beneficial to society, aligning with principles
like fairness, accountability, and privacy.
AI transparency: Obtaining trust from stakeholders by making documentation for each system and
communicating based on that information.
Bucketing: A de-identification technique that generalizes a sensitive value by replacing it with a range
of values. It is not reversible and does not maintain referential integrity.
Cloud Data Loss Prevention (DLP) API: A fully managed service designed to help discover, classify,
and protect valuable data assets with ease. It provides de-identification, masking, tokenization, and
bucketing, as well as the ability to measure re-identification risk, and sensitive data intelligence for
security assessments.
Cloud Key Management Service (KMS): A cloud-hosted key management service that lets you
manage encryption keys, allowing you to create, import, and manage cryptographic keys, and
perform cryptographic operations in a single centralized cloud service.
Constitutional AI: Scaling supervision using AI to evaluate and tune models based on safety
principles.
Data leakage: When the target variable leaks into training data, leading to unrealistic model
performance.
Data security: The protection of sensitive data used in AI systems, with the goal of minimizing the
use of sensitive data through de-identification and randomization techniques.
De-identification: The process of removing or modifying personally identifiable information (PII) from
data to protect individual privacy. Techniques include redaction, replacement, masking, tokenization,
bucketing, and shifting.
35
Differential privacy: A rigorous approach that adds noise to data or model parameters to ensure that
the inclusion or exclusion of any individual's data does not significantly affect the output or result of
the analysis. Key parameters include the privacy parameter (epsilon) and sensitivity.
Encryption: The process of converting data into a secret code to prevent unauthorized access.
Google Cloud provides default encryption at rest and in transit, as well as Cloud KMS for additional
control over encryption keys.
Federated Learning: A distributed machine learning approach that trains models using decentralized
data on devices like smartphones, preserving data privacy by avoiding the need to share raw data. It
can be used for personalization, model updates, and federated analytics.
Input/output safeguards: Protective measures to ensure AI behavior aligns with safety and ethical
standards.
Instruction tuning and RLHF: Fine-tuning language models using instructions and human feedback
to embed safety concepts and align with human values.
Intrinsic interpretability: Intrinsic interpretability refers to models that are inherently transparent and
can be understood without additional tools, such as linear regression models.
Masking: A de-identification technique that replaces some or all characters of a sensitive value with a
surrogate value. It is not reversible and does not maintain referential integrity.
Model agnostic: Model agnostic methods can be applied to a wide range of machine learning
models, as they analyze how changes in input features affect model output.
Model Cards: Short documents accompanying trained machine learning models that provide
benchmarked evaluation.
Model specific: Model specific methods are restricted to specific types of machine learning models,
relying on the internal details of the model.
PII (Personally Identifiable Information): Any information that can be used to identify an individual,
such as full names, date of birth, address, phone number, and email address.
Post-hoc interpretability: Post-hoc interpretability refers to methods applied after a model is trained
to provide insights into its behavior and explain its predictions.
Redaction: A de-identification technique that deletes all or parts of a sensitive value. It is not
reversible and does not maintain referential integrity.
Residuals: The difference between the model's predictions and the actual target values.
36
Safety classifiers: Machine learning systems that classify input text as safe or unsafe.
Training-serving skew: Discrepancies between training and serving data distributions, potentially
causing model performance issues.
37
Learn more about reference architectures, design guidance, and best practices for building,
migrating, and managing your cloud workloads at
cloud.google.com/architecture/all-reference-architectures.
App Engine: A platform for building scalable web applications and mobile backends.
BigQuery: The primary data analytics tool on Google Cloud; a fully managed data warehouse that
provides two services in one: storage plus analytics.
Cloud Monitoring: Collects and monitors metrics, events, and metadata from Google Cloud.
Cloud SQL: Google Cloud’s database service (relational database management service).
Cloud Storage: Google Cloud’s object storage service for structured, semi-structured, and
unstructured data.
Cloud VMware Engine: An engine for migrating and running VMware workloads natively on Google
Cloud.
Colab Enterprise: A notebook solution appropriate for users who don’t want to worry about
managing compute, where they need Zero configuration and serverless infrastructure.
Dataflow: A fully managed streaming analytics service that creates a pipeline to process both
streaming data and batch data.
Dataproc: A fully managed cloud service for running big data processing, analytics, and machine
learning workloads on Google Cloud.
Feature Store: A managed service that simplifies machine learning feature management by storing
and serving feature data directly from BigQuery.
38
Firebase: An app development software to build, improve, and grow mobile and web apps.
Gemini: Google's most recent foundation model, capable of handling multimodal data like text,
images, and video.
Google Kubernetes Engine: An open source container orchestration system for automating
computer application deployment, scaling, and management.
JupyterLab: A web-based interactive development environment for notebooks, code, and data.
Model Garden: A model library within Vertex AI that allows users to search, discover, and interact
with a variety of generative AI models, including both Google and open-source models.
Spanner: A fully managed Google Cloud database service designed for global scale.
TensorFlow: An end-to-end open source platform for machine learning, with a comprehensive,
flexible ecosystem of tools, libraries and community resources, originally created by Google.
Vertex AI: A unified platform for training, hosting and managing ML models. Features include
AutoML and custom training.
Vertex AI Studio: An intuitive interface within Vertex AI that provides tools for experimenting with,
tuning, and deploying generative AI models.