SlideShare a Scribd company logo
Session 5
ChengCheng Tan
Professional Machine Learning Engineer
Who am I?
ChengCheng Tan
- BA Linguistics & Computer Science, UCLA
- MS Computer Science, Stanford
- LLM + AI Safety
FAR AI Communications
AISafety.info + chatbot
- Google Women Techmakers Ambassador
Where are we on our
journey
Professional Machine Learning Certification
Learning Journey Organized by Google Developer Groups Surrey co hosting with GDG Seattle
Session 1
Feb 24, 2024
Virtual
Session 2
Mar 2, 2024
Virtual
Session 3
Mar 9, 2024
Virtual
Session 4
Mar 16, 2024
Virtual
Session 5
Mar 23, 2024
Virtual
Session 6
Apr 6, 2024
Virtual Review the
Professional ML
Engineer Exam
Guide
Review the
Professional ML
Engineer Sample
Questions
Go through:
Google Cloud
Platform Big Data
and Machine
Learning
Fundamentals
Hands On Lab
Practice:
Perform
Foundational Data,
ML, and AI Tasks in
Google Cloud
(Skill Badge) - 7hrs
Build and Deploy ML
Solutions on Vertex
AI
(Skill Badge) - 8hrs
Self
study
(and
potential
exam)
Lightning talk +
Kick-off & Machine
Learning Basics +
Q&A
Lightning talk +
GCP- Tensorflow &
Feature Engineering
+ Q&A
Lightning talk +
Enterprise Machine
Learning + Q&A
Production ML
Systems and
Computer Vision
with Google Cloud +
Q&A
Lightning talk + NLP
& Recommendation
Systems on GCP +
Q&A
Lightning talk + MOPs
& ML Pipelines on GCP
+ Q&A
Complete course:
Introduction to AI and
Machine Learning on
Google Cloud
Launching into
Machine Learning
Complete course:
TensorFlow on Google
Cloud
Feature
Engineering
Complete course:
Machine Learning in
the Enterprise
Hands On Lab
Practice:
Production Machine
Learning Systems
Computer Vision
Fundamentals with
Google Cloud
Complete course:
Natural Language
Processing on Google
Cloud
Recommendation
Systems on GCP
Complete course:
ML Ops - Getting
Started
ML Pipelines on Google
Cloud
Check Readiness:
Professional ML
Engineer Sample
Questions
1
2
4
5
3
LLM Overview
Intro to Gemini API
Sample Question Review
Q&A
Session 5 Content Review
LLM Overview
Natural Language
Processing [NLP]:
Computers Understand
Human Languages
Pre-1990s:
Rule-Based
Expert Systems
1990s-2000s:
Statistics &
Probabilities
bi-grams, tri-grams, n-grams
J.R. Firth, Linguist
You shall know a word by the
company it keeps
J.R. Firth, Linguist
2010s:
Rise of Deep Learning
and Neural Networks
Woman
Man
Queen
King
V(king) - V(queen) + V(woman) = V(man)
2013: Word2Vec Embeddings
Analogies Word Pair 1 Word Pair 2
Man-Woman king queen man woman
Capital city Athens Greece Oslo Norway
City-in-state Chicago Illinois Sacramento California
Opposite possibly impossibly ethical unethical
Comparative great greater tough tougher
Nationality adjective Switzerland Swiss Canada Canadian
Past tense walking walked swimming swam
Plural nouns mouse mice dollar dollars
2013: Word2Vec Embeddings
2010s:
Neural Networks
RNN, GRU, LSTM
Early Neural Networks
- Slow & Forgetful
2017: Transformers
- Self-Attention
- Algorithm+Data+Compute
Transformer
Architecture
Encoder + Decoder
Rise of LLMs
>1 Billion Neurons
Trained for
Next Word Prediction
Emergent Abilities
Emergent Abilities
Emergent Abilities
Pre-trained Base
Generalist
vs
Fine-tuned Models
Specialists
RLHF:
Reinforcement Learning
from Human Feedback
Fine-tuned
- Follow Instructions
- Conversations
Intro to Gemini API
Generalized Multimodal
Intelligence Network
Prototyping with
Google
AI Studio
goo.gle/ai-dev
Get API Key
Treat like password
Create new
- Freeform prompt
- Structured prompt
- Chat prompt
Freeform Prompt
Blog post creator
Write a prompt as text and
image for the model to
auto continue.
Structured Prompt
Marketing description
Table-based interface for
more complex model
priming and prompting
Chat Prompt
Talk to snowman
Simulate a back & forth
conversation with a model
Get Code
- Choose Language
- Open in Colab
- Copy to Editor
Settings
Tokens
- Words or subwords
- Different LLM tokenizers
- Training data, context window
Temperature
- Selected by probability
- Between 0 to 1.0
- Diversity or โ€œcreativityโ€
Settings
Model Sizes
Settings
Safety Ratings
Harm Categories
- Harassment
- Hate Speech
- Sexually Explicit
- Dangerous Content
Safety Ratings
Harm Categories
- Harassment
- Hate Speech
- Sexually Explicit
- Dangerous Content
Harm Probabilities
- NEGLIGIBLE
- LOW
- MEDIUM
- HIGH
Settings
The hottest new
programming language is
English.
Andrej Karpathy
OpenAI
Prompt Engineering
- Clear & Specific Instructions
- Give Examples
- Step by Step
REST APIs
Client libraries for
- Python
- JavaScript
- Android (Kotlin)
- Swift
- cURL
Setup
Install & import libraries
$ pip install google-generativeai
import google.generativeai as genai
genai.configure(api_key="<YOUR API KEY>")
Generate Text
Text only prompt
model = genai.GenerativeModel('gemini-pro')
response = model.generate_content("Write a story about a
boy and a backpack.")
print(response.text)
Generate Text
Text and image prompt
model = genai.GenerativeModel('gemini-pro-vision')
img = PIL.Image.open('image.jpg')
response = model.generate_content("Write a blog based on
this photo.", img)
print(response.text)
Chat Conversations
For interactive applications
model = genai.GenerativeModel('gemini-pro')
chat = model.start_chat(history=[])
response = chat.send_message("Hello, how are you?")
print(response.text)
Free for now.
Try it out!
goo.gle/ai-dev
Session 5 Content Review
Session 5
Study Group
ML Pipeline Automation & Orchestration
- Design pipeline.
- Implement training pipeline.
- Implement serving pipeline.
- Track and audit metadata.
- Use CI/CD to test and deploy models.
ML Solution Monitoring, Optimization, and
Maintenance
- Monitor ML solutions.
- Troubleshoot ML solutions.
- Tune performance of ML solutions for training & serving in
production.
Pipelines automate the training and deployment of models
Extract
Data
Prepare
Data
Train Model
Validate
Data
Evaluate
Model
Validate
Model
Deploy
Model
โ— Continuous Integration (CI) is no longer only about testing and
validating code and components, but also testing and validating data,
data schemas, and models.
โ— Continuous Deployment (CD) is no longer about a single software
package or a service, but a system (ML training pipeline) that should
automatically deploy another service (model prediction service).
โ— Continuous Training is a new property, specific to ML systems,
concerning automatically retraining and serving the models.
From DevOps to MLOps
In addition to the actual ML...
ML
Code
...you have to worry about so much more.
Configuration
Data Collection
Data
Verification
Feature Extraction
Process Management
Tools
Analysis Tools
Machine
Resource
Management
Serving
Infrastructure
Monitoring
Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems
ML
Code
Data
Ingestion
Data
Analysis + Validation
Data
Transformatio
n
Trainer
Model Evaluation
and Validation
Serving Logging
Shared Utilities for Garbage Collection, Data Access Controls
Pipeline Storage
Tuner
Shared Configuration Framework and Job Orchestration
Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization
TFX is the solution to this problem
Data
Analysis + Validation
Data
Transformatio
n
Trainer
Model Evaluation
and Validation
Serving
Tuner
Data
Ingestion
Logging
Shared Utilities for Garbage Collection, Data Access Controls
Pipeline Storage
Shared Configuration Framework and Job Orchestration
Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization
TFX is the solution to this problem
Tensorflow Data
Validation
TensorFlow
Transform
Estimator
or Keras
Model
TensorFlow
Model Analysis
TensorFlow
Serving
Keras
Tuner
TFX Components
Data
Ingestion
Logging
Tensorflow Data
Validation
TensorFlow
Transform
Estimator
or Keras
Model
TensorFlow
Model Analysis
Validation
Outcomes
TensorFlow
Serving
ExampleGen StatisticsGen
SchemaGen
Example Validator
Transform Trainer Evaluator Pusher
Model
Server
Data
Ingestion
Tensorflow Data
Validation
TensorFlow
Transform
Estimator
or Keras
Model
TensorFlow
Model Analysis
Validation
Outcomes
TensorFlow
Serving
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Data
Ingestion
Data
Analysis + Validation
Data
Transformatio
n
Trainer
Model Evaluation
and Validation
Serving Logging
Shared Utilities for Garbage Collection, Data Access Controls
Pipeline Storage
Tuner
Shared Configuration Framework and Job Orchestration
Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization
Estimator
or Keras
Model
TensorFlow
Model Analysis
TensorFlow
Serving
Keras
Tuner
Certification Study Group - NLP & Recommendation Systems on GCP Session 5
Rolling
Update
Canary
Deployment
Blue-Green
Deployment
Deployment Strategies
Rolling Update
v1 v2
v1
v1
v2
v2
v2
- Slowly increasing
number of NEW
- Slowly decreasing
number of OLD
Canary Deployment
v1 v2
v1
v1
v2
v2
v2
- Test on small SUBSET
in production
- After testing,
traffic shifts to NEW
Blue-Green Deployment
v1 v2
v1
v2
- 2 SEPARATE
deployments
- After testing,
switch traffic from
OLD Blue to
NEW Green.
Sample Questions Review
Which of the following are benefit(s) of running TFX on Google Cloud? Select all that apply?
A. Vertex Pipelines is the only supported orchestrator for TFX pipelines
B. Simplify scaling of TFX pipeline data processing as your data grows
C. Automate your ML operational processes for individual and multiple ML pipelines.
D. Increase your pipeline development and experimentation velocity.
Which of the following are benefit(s) of running TFX on Google Cloud? Select all that apply?
A. Vertex Pipelines is the only supported orchestrator for TFX pipelines
B. Simplify scaling of TFX pipeline data processing as your data grows
C. Automate your ML operational processes for individual and multiple ML pipelines.
D. Increase your pipeline development and experimentation velocity.
In addition to CI/CD practiced by DevOps teams, MLOps introduces:
A. Continuous classification
B. Continuous regression
C. Continuous training
D. All of the above.
In addition to CI/CD practiced by DevOps teams, MLOps introduces:
A. Continuous classification
B. Continuous regression
C. Continuous training
D. All of the above.
In what order are the following phases executed in a machine learning project?
I - Selection of ML algorithm
II - Data Exploration
III - Definition of the business use case
IV - Model monitoring
V - Model operationalization
VI - Model Development
A. I, II, III, IV, V, VI
B. III, II, I, VI, V, IV
C. II, III, I, VI, IV, V
D. II, I, III, VI, IV, V
In what order are the following phases executed in a machine learning project?
I - Selection of ML algorithm
II - Data Exploration
III - Definition of the business use case
IV - Model monitoring
V - Model operationalization
VI - Model Development
A. I, II, III, IV, V, VI
B. III, II, I, VI, V, IV
C. II, III, I, VI, IV, V
D. II, I, III, VI, IV, V
III - Definition of the business use case
II - Data Exploration
I - Selection of ML algorithm
VI - Model Development
V - Model operationalization
IV - Model monitoring
Pipeline
You want to have two versions of your application in production, but be able to direct a
small percentage of traffic to the newer version as a gradual test. This is an example of
which deployment strategy?
A. Canary deployment
B. Blue-green deployment
C. Rolling updates
You want to have two versions of your application in production, but be able to direct a
small percentage of traffic to the newer version as a gradual test. This is an example of
which deployment strategy?
A. Canary deployment
B. Blue-green deployment
C. Rolling updates
Canary
Deployment
Blue-Green
Deployment
Rolling
Updates
Q&A

More Related Content

Similar to Certification Study Group - NLP & Recommendation Systems on GCP Session 5 (20)

PDF
Innovation morning data analytics + ai
Claudia Angelelli
ย 
PDF
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Tobias Schneck
ย 
PDF
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Tobias Schneck
ย 
PPTX
Generative AI in CSharp with Semantic Kernel.pptx
Alon Fliess
ย 
PDF
Leverage the power of machine learning on windows
Josรฉ Antรณnio Silva
ย 
PPTX
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Julien SIMON
ย 
PDF
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
ย 
PDF
DEVOPS AND MACHINE LEARNING
CodeOps Technologies LLP
ย 
PPTX
B4UConference_machine learning_deeplearning
Hoa Le
ย 
PDF
Whatโ€™s New with Databricks Machine Learning
Databricks
ย 
PPTX
AzureML Welcome to the future of Predictive Analytics
Ruben Pertusa Lopez
ย 
PDF
Improving Machine Learningโ€จ Workflows: Training, Packaging and Serving.
Wilder Rodrigues
ย 
PDF
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
Gabriel Moreira
ย 
PDF
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
Gabriel Moreira
ย 
PDF
Leverage the power of machine learning on windows
Mia Chang
ย 
PDF
Machine learning operations model book mlops
RuyPerez1
ย 
PDF
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
ย 
PPTX
Serverless Machine Learning
Asavari Tayal
ย 
PDF
Data Agilityโ€”A Journey to Advanced Analytics and Machine Learning at Scale
Databricks
ย 
PDF
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Edunomica
ย 
Innovation morning data analytics + ai
Claudia Angelelli
ย 
Kubernetes and AI - Beauty and the Beast - Tobias Schneck - DOAG 24 NUE - 20....
Tobias Schneck
ย 
Containers & AI - Beauty and the Beast !?! @MLCon - 27.6.2024
Tobias Schneck
ย 
Generative AI in CSharp with Semantic Kernel.pptx
Alon Fliess
ย 
Leverage the power of machine learning on windows
Josรฉ Antรณnio Silva
ย 
Build, Train and Deploy Machine Learning Models at Scale (April 2019)
Julien SIMON
ย 
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
ย 
DEVOPS AND MACHINE LEARNING
CodeOps Technologies LLP
ย 
B4UConference_machine learning_deeplearning
Hoa Le
ย 
Whatโ€™s New with Databricks Machine Learning
Databricks
ย 
AzureML Welcome to the future of Predictive Analytics
Ruben Pertusa Lopez
ย 
Improving Machine Learningโ€จ Workflows: Training, Packaging and Serving.
Wilder Rodrigues
ย 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
Gabriel Moreira
ย 
PAPIs LATAM 2019 - Training and deploying ML models with Kubeflow and TensorF...
Gabriel Moreira
ย 
Leverage the power of machine learning on windows
Mia Chang
ย 
Machine learning operations model book mlops
RuyPerez1
ย 
MLOps - Build pipelines with Tensor Flow Extended & Kubeflow
Jan Kirenz
ย 
Serverless Machine Learning
Asavari Tayal
ย 
Data Agilityโ€”A Journey to Advanced Analytics and Machine Learning at Scale
Databricks
ย 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Edunomica
ย 

More from gdgsurrey (11)

PDF
ML Paper Writing Club Surrey (Data/AI/ML) Session#2
gdgsurrey
ย 
PDF
Unlock-Your-Potential-The-Power-of-Research-Papers .pdf
gdgsurrey
ย 
PDF
Understanding MLOps in the Era of Gen AI
gdgsurrey
ย 
PDF
Beyond Chatbots_ Unlocking Gemini's Potential through Flutter.pdf
gdgsurrey
ย 
PDF
Enhancing Angular Apps with Signals: A Gemini-Powered Showcase in Project IDX
gdgsurrey
ย 
PPTX
MOPs & ML Pipelines on GCP - Session 6, RGDC
gdgsurrey
ย 
PPTX
Production ML Systems and Computer Vision with Google Cloud
gdgsurrey
ย 
PPTX
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
gdgsurrey
ย 
PPTX
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
gdgsurrey
ย 
PPTX
2024-02-24_Session 1 - PMLE_UPDATED.pptx
gdgsurrey
ย 
PDF
Road to Google Developer Certification: Panel Discussion & Networking
gdgsurrey
ย 
ML Paper Writing Club Surrey (Data/AI/ML) Session#2
gdgsurrey
ย 
Unlock-Your-Potential-The-Power-of-Research-Papers .pdf
gdgsurrey
ย 
Understanding MLOps in the Era of Gen AI
gdgsurrey
ย 
Beyond Chatbots_ Unlocking Gemini's Potential through Flutter.pdf
gdgsurrey
ย 
Enhancing Angular Apps with Signals: A Gemini-Powered Showcase in Project IDX
gdgsurrey
ย 
MOPs & ML Pipelines on GCP - Session 6, RGDC
gdgsurrey
ย 
Production ML Systems and Computer Vision with Google Cloud
gdgsurrey
ย 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
gdgsurrey
ย 
Certification Study Group -Professional ML Engineer Session 2 (GCP-TensorFlow...
gdgsurrey
ย 
2024-02-24_Session 1 - PMLE_UPDATED.pptx
gdgsurrey
ย 
Road to Google Developer Certification: Panel Discussion & Networking
gdgsurrey
ย 
Ad

Recently uploaded (20)

DOCX
ANNOTATION on objective 10 on pmes 2022-2025
joviejanesegundo1
ย 
PPTX
How to use _name_search() method in Odoo 18
Celine George
ย 
PPTX
Aerobic and Anaerobic respiration and CPR.pptx
Olivier Rochester
ย 
PPTX
How Physics Enhances Our Quality of Life.pptx
AngeliqueTolentinoDe
ย 
PPT
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
ErlizaRosete
ย 
PDF
Lesson 1 : Science and the Art of Geography Ecosystem
marvinnbustamante1
ย 
PPTX
Ivรกn Bornacelly - Presentation of the report - Empowering the workforce in th...
EduSkills OECD
ย 
PDF
Learning Styles Inventory for Senior High School Students
Thelma Villaflores
ย 
PDF
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
nabilahk908
ย 
PDF
Free eBook ~100 Common English Proverbs (ebook) pdf.pdf
OH TEIK BIN
ย 
PDF
Our Guide to the July 2025 USPSยฎ Rate Change
Postal Advocate Inc.
ย 
PPTX
JSON, XML and Data Science introduction.pptx
Ramakrishna Reddy Bijjam
ย 
PDF
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
ย 
PPTX
Comparing Translational and Rotational Motion.pptx
AngeliqueTolentinoDe
ย 
PPTX
How to Setup Automatic Reordering Rule in Odoo 18 Inventory
Celine George
ย 
PPTX
Tanja Vujicic - PISA for Schools contact Info
EduSkills OECD
ย 
PPTX
Urban Hierarchy and Service Provisions.pptx
Islamic University of Bangladesh
ย 
PPTX
ENGLISH -PPT- Week1 Quarter1 -day-1.pptx
garcialhavz
ย 
PDF
Andreas Schleicher_Teaching Compass_Education 2040.pdf
EduSkills OECD
ย 
PPTX
2025 Completing the Pre-SET Plan Form.pptx
mansk2
ย 
ANNOTATION on objective 10 on pmes 2022-2025
joviejanesegundo1
ย 
How to use _name_search() method in Odoo 18
Celine George
ย 
Aerobic and Anaerobic respiration and CPR.pptx
Olivier Rochester
ย 
How Physics Enhances Our Quality of Life.pptx
AngeliqueTolentinoDe
ย 
M&A5 Q1 1 differentiate evolving early Philippine conventional and contempora...
ErlizaRosete
ย 
Lesson 1 : Science and the Art of Geography Ecosystem
marvinnbustamante1
ย 
Ivรกn Bornacelly - Presentation of the report - Empowering the workforce in th...
EduSkills OECD
ย 
Learning Styles Inventory for Senior High School Students
Thelma Villaflores
ย 
THE PSYCHOANALYTIC OF THE BLACK CAT BY EDGAR ALLAN POE (1).pdf
nabilahk908
ย 
Free eBook ~100 Common English Proverbs (ebook) pdf.pdf
OH TEIK BIN
ย 
Our Guide to the July 2025 USPSยฎ Rate Change
Postal Advocate Inc.
ย 
JSON, XML and Data Science introduction.pptx
Ramakrishna Reddy Bijjam
ย 
Gladiolous Cultivation practices by AKL.pdf
kushallamichhame
ย 
Comparing Translational and Rotational Motion.pptx
AngeliqueTolentinoDe
ย 
How to Setup Automatic Reordering Rule in Odoo 18 Inventory
Celine George
ย 
Tanja Vujicic - PISA for Schools contact Info
EduSkills OECD
ย 
Urban Hierarchy and Service Provisions.pptx
Islamic University of Bangladesh
ย 
ENGLISH -PPT- Week1 Quarter1 -day-1.pptx
garcialhavz
ย 
Andreas Schleicher_Teaching Compass_Education 2040.pdf
EduSkills OECD
ย 
2025 Completing the Pre-SET Plan Form.pptx
mansk2
ย 
Ad

Certification Study Group - NLP & Recommendation Systems on GCP Session 5

  • 1. Session 5 ChengCheng Tan Professional Machine Learning Engineer
  • 2. Who am I? ChengCheng Tan - BA Linguistics & Computer Science, UCLA - MS Computer Science, Stanford - LLM + AI Safety FAR AI Communications AISafety.info + chatbot - Google Women Techmakers Ambassador
  • 3. Where are we on our journey
  • 4. Professional Machine Learning Certification Learning Journey Organized by Google Developer Groups Surrey co hosting with GDG Seattle Session 1 Feb 24, 2024 Virtual Session 2 Mar 2, 2024 Virtual Session 3 Mar 9, 2024 Virtual Session 4 Mar 16, 2024 Virtual Session 5 Mar 23, 2024 Virtual Session 6 Apr 6, 2024 Virtual Review the Professional ML Engineer Exam Guide Review the Professional ML Engineer Sample Questions Go through: Google Cloud Platform Big Data and Machine Learning Fundamentals Hands On Lab Practice: Perform Foundational Data, ML, and AI Tasks in Google Cloud (Skill Badge) - 7hrs Build and Deploy ML Solutions on Vertex AI (Skill Badge) - 8hrs Self study (and potential exam) Lightning talk + Kick-off & Machine Learning Basics + Q&A Lightning talk + GCP- Tensorflow & Feature Engineering + Q&A Lightning talk + Enterprise Machine Learning + Q&A Production ML Systems and Computer Vision with Google Cloud + Q&A Lightning talk + NLP & Recommendation Systems on GCP + Q&A Lightning talk + MOPs & ML Pipelines on GCP + Q&A Complete course: Introduction to AI and Machine Learning on Google Cloud Launching into Machine Learning Complete course: TensorFlow on Google Cloud Feature Engineering Complete course: Machine Learning in the Enterprise Hands On Lab Practice: Production Machine Learning Systems Computer Vision Fundamentals with Google Cloud Complete course: Natural Language Processing on Google Cloud Recommendation Systems on GCP Complete course: ML Ops - Getting Started ML Pipelines on Google Cloud Check Readiness: Professional ML Engineer Sample Questions
  • 5. 1 2 4 5 3 LLM Overview Intro to Gemini API Sample Question Review Q&A Session 5 Content Review
  • 7. Natural Language Processing [NLP]: Computers Understand Human Languages
  • 10. J.R. Firth, Linguist You shall know a word by the company it keeps J.R. Firth, Linguist
  • 11. 2010s: Rise of Deep Learning and Neural Networks
  • 12. Woman Man Queen King V(king) - V(queen) + V(woman) = V(man) 2013: Word2Vec Embeddings
  • 13. Analogies Word Pair 1 Word Pair 2 Man-Woman king queen man woman Capital city Athens Greece Oslo Norway City-in-state Chicago Illinois Sacramento California Opposite possibly impossibly ethical unethical Comparative great greater tough tougher Nationality adjective Switzerland Swiss Canada Canadian Past tense walking walked swimming swam Plural nouns mouse mice dollar dollars 2013: Word2Vec Embeddings
  • 15. Early Neural Networks - Slow & Forgetful
  • 16. 2017: Transformers - Self-Attention - Algorithm+Data+Compute
  • 18. Rise of LLMs >1 Billion Neurons
  • 19. Trained for Next Word Prediction
  • 24. RLHF: Reinforcement Learning from Human Feedback Fine-tuned - Follow Instructions - Conversations
  • 28. Get API Key Treat like password
  • 29. Create new - Freeform prompt - Structured prompt - Chat prompt
  • 30. Freeform Prompt Blog post creator Write a prompt as text and image for the model to auto continue.
  • 31. Structured Prompt Marketing description Table-based interface for more complex model priming and prompting
  • 32. Chat Prompt Talk to snowman Simulate a back & forth conversation with a model
  • 33. Get Code - Choose Language - Open in Colab - Copy to Editor
  • 34. Settings Tokens - Words or subwords - Different LLM tokenizers - Training data, context window Temperature - Selected by probability - Between 0 to 1.0 - Diversity or โ€œcreativityโ€
  • 36. Settings Safety Ratings Harm Categories - Harassment - Hate Speech - Sexually Explicit - Dangerous Content
  • 37. Safety Ratings Harm Categories - Harassment - Hate Speech - Sexually Explicit - Dangerous Content Harm Probabilities - NEGLIGIBLE - LOW - MEDIUM - HIGH Settings
  • 38. The hottest new programming language is English. Andrej Karpathy OpenAI
  • 39. Prompt Engineering - Clear & Specific Instructions - Give Examples - Step by Step
  • 40. REST APIs Client libraries for - Python - JavaScript - Android (Kotlin) - Swift - cURL
  • 41. Setup Install & import libraries $ pip install google-generativeai import google.generativeai as genai genai.configure(api_key="<YOUR API KEY>")
  • 42. Generate Text Text only prompt model = genai.GenerativeModel('gemini-pro') response = model.generate_content("Write a story about a boy and a backpack.") print(response.text)
  • 43. Generate Text Text and image prompt model = genai.GenerativeModel('gemini-pro-vision') img = PIL.Image.open('image.jpg') response = model.generate_content("Write a blog based on this photo.", img) print(response.text)
  • 44. Chat Conversations For interactive applications model = genai.GenerativeModel('gemini-pro') chat = model.start_chat(history=[]) response = chat.send_message("Hello, how are you?") print(response.text)
  • 45. Free for now. Try it out! goo.gle/ai-dev
  • 47. Session 5 Study Group ML Pipeline Automation & Orchestration - Design pipeline. - Implement training pipeline. - Implement serving pipeline. - Track and audit metadata. - Use CI/CD to test and deploy models. ML Solution Monitoring, Optimization, and Maintenance - Monitor ML solutions. - Troubleshoot ML solutions. - Tune performance of ML solutions for training & serving in production.
  • 48. Pipelines automate the training and deployment of models Extract Data Prepare Data Train Model Validate Data Evaluate Model Validate Model Deploy Model
  • 49. โ— Continuous Integration (CI) is no longer only about testing and validating code and components, but also testing and validating data, data schemas, and models. โ— Continuous Deployment (CD) is no longer about a single software package or a service, but a system (ML training pipeline) that should automatically deploy another service (model prediction service). โ— Continuous Training is a new property, specific to ML systems, concerning automatically retraining and serving the models. From DevOps to MLOps
  • 50. In addition to the actual ML... ML Code
  • 51. ...you have to worry about so much more. Configuration Data Collection Data Verification Feature Extraction Process Management Tools Analysis Tools Machine Resource Management Serving Infrastructure Monitoring Source: Sculley et al.: Hidden Technical Debt in Machine Learning Systems ML Code
  • 52. Data Ingestion Data Analysis + Validation Data Transformatio n Trainer Model Evaluation and Validation Serving Logging Shared Utilities for Garbage Collection, Data Access Controls Pipeline Storage Tuner Shared Configuration Framework and Job Orchestration Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization TFX is the solution to this problem
  • 53. Data Analysis + Validation Data Transformatio n Trainer Model Evaluation and Validation Serving Tuner Data Ingestion Logging Shared Utilities for Garbage Collection, Data Access Controls Pipeline Storage Shared Configuration Framework and Job Orchestration Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization TFX is the solution to this problem Tensorflow Data Validation TensorFlow Transform Estimator or Keras Model TensorFlow Model Analysis TensorFlow Serving Keras Tuner
  • 54. TFX Components Data Ingestion Logging Tensorflow Data Validation TensorFlow Transform Estimator or Keras Model TensorFlow Model Analysis Validation Outcomes TensorFlow Serving ExampleGen StatisticsGen SchemaGen Example Validator Transform Trainer Evaluator Pusher Model Server Data Ingestion Tensorflow Data Validation TensorFlow Transform Estimator or Keras Model TensorFlow Model Analysis Validation Outcomes TensorFlow Serving
  • 56. Data Ingestion Data Analysis + Validation Data Transformatio n Trainer Model Evaluation and Validation Serving Logging Shared Utilities for Garbage Collection, Data Access Controls Pipeline Storage Tuner Shared Configuration Framework and Job Orchestration Integrated Frontend for Job Management, Monitoring, Debugging, Data/Model/Evaluation Visualization Estimator or Keras Model TensorFlow Model Analysis TensorFlow Serving Keras Tuner
  • 59. Rolling Update v1 v2 v1 v1 v2 v2 v2 - Slowly increasing number of NEW - Slowly decreasing number of OLD
  • 60. Canary Deployment v1 v2 v1 v1 v2 v2 v2 - Test on small SUBSET in production - After testing, traffic shifts to NEW
  • 61. Blue-Green Deployment v1 v2 v1 v2 - 2 SEPARATE deployments - After testing, switch traffic from OLD Blue to NEW Green.
  • 63. Which of the following are benefit(s) of running TFX on Google Cloud? Select all that apply? A. Vertex Pipelines is the only supported orchestrator for TFX pipelines B. Simplify scaling of TFX pipeline data processing as your data grows C. Automate your ML operational processes for individual and multiple ML pipelines. D. Increase your pipeline development and experimentation velocity.
  • 64. Which of the following are benefit(s) of running TFX on Google Cloud? Select all that apply? A. Vertex Pipelines is the only supported orchestrator for TFX pipelines B. Simplify scaling of TFX pipeline data processing as your data grows C. Automate your ML operational processes for individual and multiple ML pipelines. D. Increase your pipeline development and experimentation velocity.
  • 65. In addition to CI/CD practiced by DevOps teams, MLOps introduces: A. Continuous classification B. Continuous regression C. Continuous training D. All of the above.
  • 66. In addition to CI/CD practiced by DevOps teams, MLOps introduces: A. Continuous classification B. Continuous regression C. Continuous training D. All of the above.
  • 67. In what order are the following phases executed in a machine learning project? I - Selection of ML algorithm II - Data Exploration III - Definition of the business use case IV - Model monitoring V - Model operationalization VI - Model Development A. I, II, III, IV, V, VI B. III, II, I, VI, V, IV C. II, III, I, VI, IV, V D. II, I, III, VI, IV, V
  • 68. In what order are the following phases executed in a machine learning project? I - Selection of ML algorithm II - Data Exploration III - Definition of the business use case IV - Model monitoring V - Model operationalization VI - Model Development A. I, II, III, IV, V, VI B. III, II, I, VI, V, IV C. II, III, I, VI, IV, V D. II, I, III, VI, IV, V III - Definition of the business use case II - Data Exploration I - Selection of ML algorithm VI - Model Development V - Model operationalization IV - Model monitoring Pipeline
  • 69. You want to have two versions of your application in production, but be able to direct a small percentage of traffic to the newer version as a gradual test. This is an example of which deployment strategy? A. Canary deployment B. Blue-green deployment C. Rolling updates
  • 70. You want to have two versions of your application in production, but be able to direct a small percentage of traffic to the newer version as a gradual test. This is an example of which deployment strategy? A. Canary deployment B. Blue-green deployment C. Rolling updates Canary Deployment Blue-Green Deployment Rolling Updates
  • 71. Q&A