SlideShare a Scribd company logo
15 APRIL 2021
Machine Learning Operations
On AWS
Who I am?
• Experienced principal solutions architect, a lead developer
and head of practice for Inawisdom.
• All 12 AWS Certifications including SA Pro, Dev Ops Data
Analytics Specialism, and Machine Learning Specialism.
• Over 6 years of AWS experience and he has been
responsible for running production workloads of over 200
containers in a performance system that responded to
18,000 requests per second
• Visionary in ML Ops, Produced production workloads of
ML models at scale, including 1500 inferences per minute,
including active monitoring and alerting
• Has developed in Python, NodeJS + J2EE
• I am one of the Ipswich AWS User Group Leaders and
contributes to the AWS Community by speaking at several
summits, community days and meet-ups.
• Regular blogger, open-source contributor, and SME on
Machine Learning, MLOps, DevOps, Containers and
Serverless.
• I work for Inawisdom (an AWS Partner) as a principal
solutions architect and head of practice. I am Inawisdom’s
AWS APN Ambassador and evangelist.
Phil Basford
phil@inawisdom.com
@philipbasford
#1 EMEA
2
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved |
The AWS ML Stack
Broadest and most complete set of Machine Learning capabilities
VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS
Ground
Truth
ML
Marketplace
Neo
Augmented
AI
Built-in
algorithms
Notebooks Experiments
Model
training &
tuning
Debugger Autopilot
Model
hosting
Model Monitor
Deep Learning
AMIs & Containers
GPUs &
CPUs
Elastic
Inference
Inferentia FPGA
Amazon
Rekognition
Amazon
Polly
Amazon
Transcribe
+Medical
Amazon
Comprehend
+Medical
Amazon
Translate
Amazon
Lex
Amazon
Personalize
Amazon
Forecast
Amazon
Fraud Detector
Amazon
CodeGuru
AI SERVICES
ML SERVICES
ML FRAMEWORKS & INFRASTRUCTURE
Amazon
Textract
Amazon
Kendra
Contact Lens
For Amazon Connect
SageMaker Studio IDE
Amazon SageMaker
DeepGraphLibrary
4
ML LIFE CYCLE
Data Exploration
SageMaker Ground Truth
AWS Data Exchange
AWS ‘Lake House’
Open Data Sets
Experiment
SageMaker Notebooks
SageMaker Auto Pilot
ML Market Place
Testing and Evolution
SageMaker Debugger
SageMaker Experiments
Refinement
SageMaker Hyperparameter Tuning
SageMaker Notebooks
Inference
SageMaker Endpoints
SageMaker Batch Transform
Operationalize
SageMaker Model Monitor
AWS Step Functions Data Science SDK
SageMaker Pipelines
Define the Problem and Value
ARCHITECTURE
6
Monitoring, observing
and alerting using
CloudWatch and X-
Ray. Infrastructure as
Code with SAM and
CloudFormation.
Operational Excellence
Least privilege, Data
Encryption at Rest,
and Data Encryption
in Transit using IAM
Policies, Resource
Policies, KMS, Secret
Manager, VPC and
Security Group.
Security
Elastic scaling based
on demand and
meeting response
times using Auto
Scaling, Serverless,
and Per Request
managed services.
Performance
Serverless and fully
managed services to
lower TCO. Resource
Tag everything
possible for cost
analysis. Right sizing
instance types for
model hosting.
Cost Optimisation
Fault tolerance and
auto healing to meet a
target availability
using Auto Scaling,
Multi AZ, Multi Region,
Read Replicas and
Snapshots.
Reliance
https://d1.awsstatic.com/whitepapers/architecture/wellarchitected-Machine-Learning-Lens.pdf
7
SERVERLESS
Lambda API Gateway
DynamoDB is A fully
managed non-sql
cloud service from
AWS. For machine
learning it is typically
used for reference
data.
DynamoDB
S3
SNS ; Pub + Sub
SQS : Queues
Fargate : Containers
Step Functions:
Workflows
..and more
Highly durable object
storage used for many
things including data
lakes. For machine
learning it is used to
store training data sets
and model artefacts
API Gateway is the
endpoint for your API,
it has extensive
security measures,
logging, and API
definition using open
API or swagger.
AWS Lambda is
AWS’s native and fully
managed cloud
service for running
application code
without the need to
run servers.
8
THE SOLUTION AND ARCHITECTURE
9
Remember to always apply least privilege and other AWS Security best practice, be very protective of your data
SECURITY
AWS KMS: Encrypt everything! however if your data is PII or PCI-DSS then consider
using a dedicated Custom Key in KMS to-do this. This allows you tighter control by
limiting the ability to decrypt data, providing another layer security over S3.
AWS IAM: SageMaker like EC2 is granted access to other AWS services using IAM
roles and you need to make sure your policies are locked down to only the Actions
and Resources you need.
Amazon S3: SageMaker can use a range of data stores, however S3 is the most
popular. However please make sure you enable encryption, resource policies,
logging and versioning on your buckets.
Amazon VPC: SageMaker can run outside a VPC and access data over the public
internet (hopefully using HTTPS). This runs contrary to most corporate Information
Security Policies. Therefore please deploy in VPC with Private Links for extra security.
Data: Most importantly, only use the data you need. If the data contains PII or
PCI-DSS and you do not need those values then remove them or sanitised.
ML OPS PROCESSES
11
Dev Ops in Machine Learning
ML OPS
Data Updates / Drift Detection
Structured, Simi
Structured,
Unstructured
Spark, EMR,
Glue, Matillion
Spark,
scikit-learn,
Containers,
SageMaker
processing
Including
validation of
Data
Technology
Considerations
ML
Algorithms and
Frameworks
SageMaker
training jobs
Accuracy Checks,
Golden Data Set
testing.
Model Debugging
New Data
Available
Data Pre
Processing
Component
ETL Training Verification Inference Monitoring
Batch or
Real-time
SageMaker
Endpoints,
SageMaker
Batch
Transform, ECS
Docker and
Functions,
SageMaker
Debugger
Base lining /
Sampling
predictions
Model drift
detection, Model
selection
automation
SageMaker
Model Monitor,
CloudWatch
12
Dev Ops in Machine Learning
ML OPS
New Data Features / DS Changes (script mode)
Verified Data
Available
Data Pre-
processing
Data set used to
train previously
CI/CD is used to
build model
code
Component
Technology
Training Verification Inference Monitoring
Data Scientist ML Engineer Source Control
ETL
DevOps
Recommend
Additions
Potential
changes
SageMaker
Experiments and
hyper parameter
tuning jobs
TRAINING
18
Optimising training and reach the business needs
TRAINING
Cost
Effort
Speed/Time
Complexity
Distributed Training
Split up large amounts of data into chucks and
training the chunks across many instances then
combining the outputs at the end
Multi Job Training
Used when a generalise model does not represent
the characteristics of the data or different
hyperparameters are need, i.e. Locations or Product
Groups. This involves running multiple training
process for different data sets at the same time
Data Parallelism
Using many cores or instances to train algorithms like
GPT-3’s that has billions of parameters
.
Model Parallelism
Splitting up training for a model that uses a Deep
Learning algorithm and a dense and/or a large
number of layers. As a single GPU cannot handled it
Pipe vs File
Improving training times by loading data incrementally
into models during training. Instead of requiring a
large amount of data to be downloaded before
training can start
Common Issues
Ø Train takes too long! We need it to take hours
not days
Ø Training is costing lots of money and we are
not sure if all the resources are being fully
utilised.
Ø Our data set is too big and uses a lot of
memory and network IO to process.
Ø We need to train hundreds of models at the
same time
Ø Client teams have limited experience in
orchestrion of training at scale
INFERENCE
Inference types
ML OPS – INFERENCE TYPES
Real Time
➤ Business Critical, commonly uses are chat
bots, classifiers, recommenders or liner
regressors. Like credit risk, journey times
etc
➤ Hundred or thousands individual
predictions per second
➤ API Driven with Low Latency, typically
below 135ms at the 90th percentile.
Near Real Time
➤ Commonly used for image classification or
file analysis
➤ Hundred individual predictions per minute
and processing needs to be done within
seconds
➤ Event or Message Queue based,
predictions are sent back or stored
Occasional
➤ Examples are simple classifiers like Tax
codes
➤ Only a few predictions a month and
processing needs to be completed with
minutes
➤ API, Event or Message Queue based,
predicts sent back or stored
Batch
➤ End of month reporting, invoice
generation, warranty plan management
➤ Runs at Daily / Monthly / Set Times
➤ The data set is typically millions or tens of
millions of rows at once
Micro Batch
➤ Anomaly detection, invoice
approval and Image processing
➤ Executed regularly : every x
minutes or Y number of events.
Triggered by file upload or data
ingestion
➤ The data set is typically hundreds
or thousands of rows at once
Edge
➤ Used for Computer Vision, Fault Detection
in Manufacturing
➤ Runs on mobile phone apps and low
power devices. Uses sensors (i.e. video,
location, or heat)
➤ Model output is normally sent back to the
Cloud at regular intervals for analysis.
23
Endpoint
Docker containers host the inference engines, inference engines can be written in any language and endpoints can use
more than one container. Primary container needs to implement a simple REST API.
Common Engines:
➤ 685385470294.dkr.ecr.eu-west-1.amazonaws.com/xgboost:1
➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-
tensorflow:1.11-cpu-py2
➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-
tensorflow:1.11-gpu-py2
➤ 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-
inference:1.13-gpu
➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-
tensorflow-serving:1.11-cpu
AMAZON SAGEMAKER – INFERENCE ENGINES
Dockerfile:
FROM tensorflow/serving:latest
RUN apt-get update && apt-get install -y --no-install-
recommends nginx git
RUN mkdir -p /opt/ml/model
COPY nginx.conf /etc/nginx/nginx.conf
ENTRYPOINT service nginx start | tensorflow_model_server --
rest_api_port=8501 --
model_config_file=/opt/ml/model/models.config
Container
http://localhost:8080/invocations
http://localhost:8080/ping
Amazon
SageMaker model.tar.gz
Primary Container
Nginx Gunicorn Model
Runtime
link
/opt/ml/model
X-Amzn-SageMaker-Custom-Attributes
24
Logical components of an endpoint within Amazon SageMaker
AMAZON SAGEMAKER – REAL TIME INFERENCE
All components are immutable, any configuration changes require new models and endpoint configurations,
however there is a specific SageMaker API to update instance count and variant weight
Endpoint
Configuration
Endpoint
Inference Engine + Model
Primary Container
Container
Container
VPC
S3
KMS + IAM
Inference Engine + Model
Primary Container
Container
Container
VPC
S3
KMS + IAM
Production Variant
Production Variant
Model
Initial
Count + Weight
Instance Type
SDKs
REST
SignV4
Requests
Name
25
The following shows same experiment with M5 Instances and autoscaling enabled:
M5 INSTANCES WITH AUTOSCALING
The autoscaling group was set
between 2-4 instances and the
scaling policy to 100k requests.
The number of innovations
continued to rise and CPU never
went above 100%.
A scaling event happen at 08:45
and took 5 minutes to warm up.
No instances crashed and up to 4
instances were used.
26
The following chart compares the two M5 based experiments:
WHY IS CPU USAGE THAT IMPORTANT?
Latency(red) increased when the
CPU went over 100%. The is due
to invocations having to wait
within SageMaker to be processed
Zzzzz, Phil does sleep!
The two M5 experiments had a
cost of $42.96
SageMaker Studio was used
instead of a SageMaker notebook
instances.
27
The following are the four ways to deploy new versions of models in Amazon SageMaker
Rolling:
DEV OPS WITH SAGEMAKER
Endpoint
Configuration
Canary Variant
Full Variant
Endpoint
Configuration
Full Variant
Endpoint
Configuration
Full Variant
Endpoint
Configuration
Full Variant
Endpoint
Configuration
New Variant
Old Variant
Canary: Blue/Green: Linear:
weight
The default option, SageMaker
will start new instances and then
once they are healthy stop the
old ones
Canary deployments are done
using two Variants in the
Endpoint Configuration and
performed over two
CloudFormation updates.
Requires two CloudFormation
stacks and then changing the
endpoint name in the AWS
Lambda using an Environment
Variable
Linear uses two Variants in the
Endpoint Configuration and using
an AWS Step Function and AWS
Lambda to call the
UpdateEndpointWeightsAndCap
acities API.
MONITORING
Cost optimisation for training and inference
ML OPS – A 360°
Change in
Instance Size
Change in
Instance Type
No RI or
Saving Plans
for ML
Top Tips
➤ Spot instances (surplus capacity from cloud
providers) are cheaper for workloads that can
handle being rerun like batch or training. For
longer execution times consider using spot
instances with model checkpointing.
Daily
Feb
20
Mar
20
Apr
20
May
20
Jun
20
Jul
20
Aug
20
Sep
20
Oct
20
Nov
20
Dec
20
Jan
21
Inference Training Notebooks
Inference
57%
Training
15%
Notebooks
28%
Monthly Yearly
➤ Models that require GPU for training justify
additional consideration due to the use of more
expensive instance types.
➤ For GPUs analysis of the utilization of the GPUs
Cores and Memory is needed. However, CPU and
Network IO all need looking at. Make sure you feed
the GPUs enough data without bottlenecking
➤ Multi Model support allows for more than one
model to be hosted on the same instance. This
is very efficient for hosting many small models
(e.g. a model per city) as hosting one per
instance each would give poor resource
utilisation.
30
Business Performance and KPIs
KPIS AND MODEL MONITORING
➤ The most import measure of a model is it
accomplishing what it set out to achieve
➤ This is judged by setting some clear KPIs and
judging how the model affects them.
➤ This can be done a number of ways however one
of the most simplest and impactful is constructing
a dashboard in a BI tool like QuickSight
Model Performance
➤ SageMaker Monitor can be used to base line a
model and detect diff
➤ Another important aspect to monitor is that
predictions are with in known boundaries
➤ Performance monitoring of the model can trigger
retraining when issues arise
AWS CloudWatch a dashboard providing complete oversight of the inference process
PERFORMANCE MONITORING
API error and
success rates
API Gateway
response times
using percentiles
Lambda
executions
Availability
recorded from
health checker
API Usage data
for Usage Plan
32
X-RAY traces can help you spot bottlenecks and costly areas of the code including inside your models.
OBSERVING INFERENCE
Inference Function
Inference Function
Function A
Function B
Function C
Function C
Function D
Function E
Function F
Function G
Function H
APIGWUrl
Model
Function 1
Function 2
SQL: db_url
Model
33
Amazon SageMaker exposes metrics to AWS CloudWatch
MONITORING SAGEMAKER
Name Dimension Statistic Threshold Time Period Missing
Endpoint model
latency
Milliseconds Average >100 For 5 minutes ignore
Endpoint model
invocations
Count Sum
> 10000
For 15 minutes
notBreaching
< 1000 breaching
Endpoint disk
usage
% Average
> 90%
For 15 minutes ignore
> 80%
Endpoint CPU
usage
% Average
> 90%
For 15 minutes ignore
> 80%
Endpoint memory
usage
% Average
> 90%
For 15 minutes ignore
> 80%
Endpoint 5XX
errors
Count Sum >10 For 5 minutes
notBreaching
Endpoint 4XX
errors
Count Sum >50 For 5 minutes
The metrics in AWS CloudWatch
can then be used for alarms:
➤ Always pay attention to how to
handle missing data
➤ Always test your alarms
➤ Look to level your alarms
➤ Make your alarms complement
each other
AUTOMATION
Using automation and tools to deploy models and to maintain consistency
AUTOMATION AND PIPELINES
Data Foundation
Governance and Control
Experiments Development
Pre-
Production
Production
Infrastructure
Foundations:
➤ A solid Data
Lake/Warehouse with good
sources of data is required
for long term scaling of ML
usage
➤ Running models
operationally also means
considering availability,
fault tolerance and scaling
of instances.
➤ Having a robust security
posture using multiple
layers with auditability is
essential
➤ Consistent architecture,
development approaches
and deployments aid
maintainability
Scaling and refinement:
➤ Did your models improve,
or do they still meet, the
outcomes and KPIs that
you set out to affect?
➤ Has innovations in
technology meant that
complexity in development
or deployment can be
simplified? Allowing more
focus to be put on other
uses of ML?
➤ Are your models running
on the latest and most
optimal hardware?
➤ Do you need a feature
store to improve
collaboration and sharing
of features?
➤ Do you need a model
registry for control and
governance?
36
AWS Step Functions Data Science Software Development Kit
MODEL RETRAINING
AWS Glue: Used for raw data ingress, cleaning that data and
then transforming that data into a training data set
Deployments to Amazon SageMaker endpoints: The ability
to perform deployments from the pipeline, including
blue/green, linear and canary style updates.
AWS Lambda: Used to stitch elements together and perform
any additional logic
AWS ECS/Fargate: There are situations where you may need
to run very long running processes over the data to prep the
data for training. Lambda is not suitable for this due to its
maximum execution time and memory limits, therefore
Fargate is preferred in these situations.
Amazon SageMaker training jobs: The ability to run training
on the data that the pipeline has got ready for you
38
re:Invent and Webinar:
➤ https://pages.awscloud.com/GLOBAL-PTNR-OE-IPC-AIML-Inawisdom-Oct-2019-reg-event.html
➤ https://www.youtube.com/watch?v=lx9fP_4yi2s
➤ https://www.inawisdom.com/machine-learning/amazon-sagemaker-endpoints-inference/
➤ https://www.inawisdom.com/machine-learning/machine-learning-performance-more-than-skin-deep/
➤ https://www.inawisdom.com/machine-learning/a-model-is-for-life-not-just-for-christmas/
➤ https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html
➤ https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alar
ms-and-missing-data
➤ https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/readmelink.html#getting-started-
with-sample-jupyter-notebooks
REFERENCES
Other:
My blogs:
QUESTIONS
020 3575 1337
info@inawisdom.com
Columba House,
Adastral Park, Martlesham Heath
Ipswich, Suffolk, IP5 3RE
www.inawisdom.com
@philipbasford

More Related Content

What's hot (20)

PDF
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
PDF
What is MLOps
Henrik Skogström
 
PPTX
MLOps.pptx
AllenPeter7
 
PPTX
From Data Science to MLOps
Carl W. Handlin
 
PPTX
MLOps - The Assembly Line of ML
Jordan Birdsell
 
PPTX
MLOps in action
Pieter de Bruin
 
PDF
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
PDF
MLOps Using MLflow
Databricks
 
PPTX
introduction Azure OpenAI by Usama wahab khan
Usama Wahab Khan Cloud, Data and AI
 
PDF
Seamless MLOps with Seldon and MLflow
Databricks
 
PDF
MLOps by Sasha Rosenbaum
Sasha Rosenbaum
 
PDF
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
PDF
Generative AI con Amazon Bedrock.pdf
Guido Maria Nebiolo
 
PDF
Databricks Overview for MLOps
Databricks
 
PDF
Generative AI
All Things Open
 
PDF
LanGCHAIN Framework
Keymate.AI
 
PPTX
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 
PDF
Leveraging Generative AI & Best practices
DianaGray10
 
PPTX
Generative AI, WiDS 2023.pptx
Colleen Farrelly
 
PDF
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
What is MLOps
Henrik Skogström
 
MLOps.pptx
AllenPeter7
 
From Data Science to MLOps
Carl W. Handlin
 
MLOps - The Assembly Line of ML
Jordan Birdsell
 
MLOps in action
Pieter de Bruin
 
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
MLOps Using MLflow
Databricks
 
introduction Azure OpenAI by Usama wahab khan
Usama Wahab Khan Cloud, Data and AI
 
Seamless MLOps with Seldon and MLflow
Databricks
 
MLOps by Sasha Rosenbaum
Sasha Rosenbaum
 
The A-Z of Data: Introduction to MLOps
DataPhoenix
 
Generative AI con Amazon Bedrock.pdf
Guido Maria Nebiolo
 
Databricks Overview for MLOps
Databricks
 
Generative AI
All Things Open
 
LanGCHAIN Framework
Keymate.AI
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 
Leveraging Generative AI & Best practices
DianaGray10
 
Generative AI, WiDS 2023.pptx
Colleen Farrelly
 
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 

Similar to Ml ops on AWS (20)

PDF
Ml 3 ways
PhilipBasford
 
PDF
Amazon SageMaker workshop
Julien SIMON
 
PDF
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
Provectus
 
PPTX
Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...
Julien SIMON
 
PPTX
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
Julien SIMON
 
PPTX
Demystifying Machine Learning with AWS (ACD Mumbai)
AWS User Group Pune
 
PPTX
WhereML a Serverless ML Powered Location Guessing Twitter Bot
Randall Hunt
 
PPTX
Where ml ai_heavy
Randall Hunt
 
PDF
Mcl345 re invent_sagemaker_dmbanga
Dan Romuald Mbanga
 
PPTX
Quickly and easily build, train, and deploy machine learning models at any scale
AWS Germany
 
PDF
AWS Certified Machine Learning Study Guide Specialty MLS C01 Exam 1st Edition...
kiyyadaros
 
PPTX
AWS re:Invent 2018 - Machine Learning recap (December 2018)
Julien SIMON
 
PPTX
Advanced Machine Learning with Amazon SageMaker
Julien SIMON
 
PPTX
An Introduction to Amazon SageMaker (October 2018)
Julien SIMON
 
PPTX
Deep Dive Amazon SageMaker
Cobus Bernard
 
PDF
Amazon reInvent 2020 Recap: AI and Machine Learning
Chris Fregly
 
PDF
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS Riyadh User Group
 
PPT
Strata CA 2019: From Jupyter to Production Manu Mukerji
Manu Mukerji
 
PPTX
Amazon SageMaker (December 2018)
Julien SIMON
 
PDF
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Codiax
 
Ml 3 ways
PhilipBasford
 
Amazon SageMaker workshop
Julien SIMON
 
Data Summer Conf 2018, “Build, train, and deploy machine learning models at s...
Provectus
 
Machine Learning: From Notebook to Production with Amazon Sagemaker (January ...
Julien SIMON
 
AWS re:Invent 2018 - ENT321 - SageMaker Workshop
Julien SIMON
 
Demystifying Machine Learning with AWS (ACD Mumbai)
AWS User Group Pune
 
WhereML a Serverless ML Powered Location Guessing Twitter Bot
Randall Hunt
 
Where ml ai_heavy
Randall Hunt
 
Mcl345 re invent_sagemaker_dmbanga
Dan Romuald Mbanga
 
Quickly and easily build, train, and deploy machine learning models at any scale
AWS Germany
 
AWS Certified Machine Learning Study Guide Specialty MLS C01 Exam 1st Edition...
kiyyadaros
 
AWS re:Invent 2018 - Machine Learning recap (December 2018)
Julien SIMON
 
Advanced Machine Learning with Amazon SageMaker
Julien SIMON
 
An Introduction to Amazon SageMaker (October 2018)
Julien SIMON
 
Deep Dive Amazon SageMaker
Cobus Bernard
 
Amazon reInvent 2020 Recap: AI and Machine Learning
Chris Fregly
 
AWS reinvent 2019 recap - Riyadh - AI And ML - Ahmed Raafat
AWS Riyadh User Group
 
Strata CA 2019: From Jupyter to Production Manu Mukerji
Manu Mukerji
 
Amazon SageMaker (December 2018)
Julien SIMON
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Codiax
 
Ad

More from PhilipBasford (19)

PPTX
Gartner Talk on AI Transformation & Innovation
PhilipBasford
 
PDF
AWS Construction Event for Gen AI and Connected Data Lakes - Jun 2024
PhilipBasford
 
PDF
AWS Summit London 2024 - Cognizant Partner Spotlight - Cognitive Architecture...
PhilipBasford
 
PDF
re:cap Generative AI journey with Bedrock
PhilipBasford
 
PDF
AIM102-S_Cognizant_CognizantCognitive
PhilipBasford
 
PDF
Inawisdom IDP
PhilipBasford
 
PDF
Inawisdom MLOPS
PhilipBasford
 
PDF
Inawisdom Quick Sight
PhilipBasford
 
PDF
Inawsidom - Data Journey
PhilipBasford
 
PDF
Realizing_the_real_business_impact_of_gen_AI_white_paper.pdf
PhilipBasford
 
PDF
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
PhilipBasford
 
PDF
Inawisdom Overview - construction.pdf
PhilipBasford
 
PDF
D3 IDP Slides.pdf
PhilipBasford
 
PDF
C04 Driving understanding from Documents and unstructured data sources final.pdf
PhilipBasford
 
PPTX
Securing your Machine Learning models
PhilipBasford
 
PPTX
Fish Cam.pptx
PhilipBasford
 
PDF
Palringo AWS London Summit 2017
PhilipBasford
 
PDF
Palringo : a startup's journey from a data center to the cloud
PhilipBasford
 
PPTX
Machine learning at scale with aws sage maker
PhilipBasford
 
Gartner Talk on AI Transformation & Innovation
PhilipBasford
 
AWS Construction Event for Gen AI and Connected Data Lakes - Jun 2024
PhilipBasford
 
AWS Summit London 2024 - Cognizant Partner Spotlight - Cognitive Architecture...
PhilipBasford
 
re:cap Generative AI journey with Bedrock
PhilipBasford
 
AIM102-S_Cognizant_CognizantCognitive
PhilipBasford
 
Inawisdom IDP
PhilipBasford
 
Inawisdom MLOPS
PhilipBasford
 
Inawisdom Quick Sight
PhilipBasford
 
Inawsidom - Data Journey
PhilipBasford
 
Realizing_the_real_business_impact_of_gen_AI_white_paper.pdf
PhilipBasford
 
Gen AI Cognizant & AWS event presentation_12 Oct.pdf
PhilipBasford
 
Inawisdom Overview - construction.pdf
PhilipBasford
 
D3 IDP Slides.pdf
PhilipBasford
 
C04 Driving understanding from Documents and unstructured data sources final.pdf
PhilipBasford
 
Securing your Machine Learning models
PhilipBasford
 
Fish Cam.pptx
PhilipBasford
 
Palringo AWS London Summit 2017
PhilipBasford
 
Palringo : a startup's journey from a data center to the cloud
PhilipBasford
 
Machine learning at scale with aws sage maker
PhilipBasford
 
Ad

Recently uploaded (20)

PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Persuasive AI: risks and opportunities in the age of digital debate
Speck&Tech
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 

Ml ops on AWS

  • 1. 15 APRIL 2021 Machine Learning Operations On AWS
  • 2. Who I am? • Experienced principal solutions architect, a lead developer and head of practice for Inawisdom. • All 12 AWS Certifications including SA Pro, Dev Ops Data Analytics Specialism, and Machine Learning Specialism. • Over 6 years of AWS experience and he has been responsible for running production workloads of over 200 containers in a performance system that responded to 18,000 requests per second • Visionary in ML Ops, Produced production workloads of ML models at scale, including 1500 inferences per minute, including active monitoring and alerting • Has developed in Python, NodeJS + J2EE • I am one of the Ipswich AWS User Group Leaders and contributes to the AWS Community by speaking at several summits, community days and meet-ups. • Regular blogger, open-source contributor, and SME on Machine Learning, MLOps, DevOps, Containers and Serverless. • I work for Inawisdom (an AWS Partner) as a principal solutions architect and head of practice. I am Inawisdom’s AWS APN Ambassador and evangelist. Phil Basford [email protected] @philipbasford #1 EMEA
  • 3. 2 © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved | The AWS ML Stack Broadest and most complete set of Machine Learning capabilities VISION SPEECH TEXT SEARCH CHATBOTS PERSONALIZATION FORECASTING FRAUD DEVELOPMENT CONTACT CENTERS Ground Truth ML Marketplace Neo Augmented AI Built-in algorithms Notebooks Experiments Model training & tuning Debugger Autopilot Model hosting Model Monitor Deep Learning AMIs & Containers GPUs & CPUs Elastic Inference Inferentia FPGA Amazon Rekognition Amazon Polly Amazon Transcribe +Medical Amazon Comprehend +Medical Amazon Translate Amazon Lex Amazon Personalize Amazon Forecast Amazon Fraud Detector Amazon CodeGuru AI SERVICES ML SERVICES ML FRAMEWORKS & INFRASTRUCTURE Amazon Textract Amazon Kendra Contact Lens For Amazon Connect SageMaker Studio IDE Amazon SageMaker DeepGraphLibrary
  • 4. 4 ML LIFE CYCLE Data Exploration SageMaker Ground Truth AWS Data Exchange AWS ‘Lake House’ Open Data Sets Experiment SageMaker Notebooks SageMaker Auto Pilot ML Market Place Testing and Evolution SageMaker Debugger SageMaker Experiments Refinement SageMaker Hyperparameter Tuning SageMaker Notebooks Inference SageMaker Endpoints SageMaker Batch Transform Operationalize SageMaker Model Monitor AWS Step Functions Data Science SDK SageMaker Pipelines Define the Problem and Value
  • 6. 6 Monitoring, observing and alerting using CloudWatch and X- Ray. Infrastructure as Code with SAM and CloudFormation. Operational Excellence Least privilege, Data Encryption at Rest, and Data Encryption in Transit using IAM Policies, Resource Policies, KMS, Secret Manager, VPC and Security Group. Security Elastic scaling based on demand and meeting response times using Auto Scaling, Serverless, and Per Request managed services. Performance Serverless and fully managed services to lower TCO. Resource Tag everything possible for cost analysis. Right sizing instance types for model hosting. Cost Optimisation Fault tolerance and auto healing to meet a target availability using Auto Scaling, Multi AZ, Multi Region, Read Replicas and Snapshots. Reliance https://d1.awsstatic.com/whitepapers/architecture/wellarchitected-Machine-Learning-Lens.pdf
  • 7. 7 SERVERLESS Lambda API Gateway DynamoDB is A fully managed non-sql cloud service from AWS. For machine learning it is typically used for reference data. DynamoDB S3 SNS ; Pub + Sub SQS : Queues Fargate : Containers Step Functions: Workflows ..and more Highly durable object storage used for many things including data lakes. For machine learning it is used to store training data sets and model artefacts API Gateway is the endpoint for your API, it has extensive security measures, logging, and API definition using open API or swagger. AWS Lambda is AWS’s native and fully managed cloud service for running application code without the need to run servers.
  • 8. 8 THE SOLUTION AND ARCHITECTURE
  • 9. 9 Remember to always apply least privilege and other AWS Security best practice, be very protective of your data SECURITY AWS KMS: Encrypt everything! however if your data is PII or PCI-DSS then consider using a dedicated Custom Key in KMS to-do this. This allows you tighter control by limiting the ability to decrypt data, providing another layer security over S3. AWS IAM: SageMaker like EC2 is granted access to other AWS services using IAM roles and you need to make sure your policies are locked down to only the Actions and Resources you need. Amazon S3: SageMaker can use a range of data stores, however S3 is the most popular. However please make sure you enable encryption, resource policies, logging and versioning on your buckets. Amazon VPC: SageMaker can run outside a VPC and access data over the public internet (hopefully using HTTPS). This runs contrary to most corporate Information Security Policies. Therefore please deploy in VPC with Private Links for extra security. Data: Most importantly, only use the data you need. If the data contains PII or PCI-DSS and you do not need those values then remove them or sanitised.
  • 11. 11 Dev Ops in Machine Learning ML OPS Data Updates / Drift Detection Structured, Simi Structured, Unstructured Spark, EMR, Glue, Matillion Spark, scikit-learn, Containers, SageMaker processing Including validation of Data Technology Considerations ML Algorithms and Frameworks SageMaker training jobs Accuracy Checks, Golden Data Set testing. Model Debugging New Data Available Data Pre Processing Component ETL Training Verification Inference Monitoring Batch or Real-time SageMaker Endpoints, SageMaker Batch Transform, ECS Docker and Functions, SageMaker Debugger Base lining / Sampling predictions Model drift detection, Model selection automation SageMaker Model Monitor, CloudWatch
  • 12. 12 Dev Ops in Machine Learning ML OPS New Data Features / DS Changes (script mode) Verified Data Available Data Pre- processing Data set used to train previously CI/CD is used to build model code Component Technology Training Verification Inference Monitoring Data Scientist ML Engineer Source Control ETL DevOps Recommend Additions Potential changes SageMaker Experiments and hyper parameter tuning jobs
  • 14. 18 Optimising training and reach the business needs TRAINING Cost Effort Speed/Time Complexity Distributed Training Split up large amounts of data into chucks and training the chunks across many instances then combining the outputs at the end Multi Job Training Used when a generalise model does not represent the characteristics of the data or different hyperparameters are need, i.e. Locations or Product Groups. This involves running multiple training process for different data sets at the same time Data Parallelism Using many cores or instances to train algorithms like GPT-3’s that has billions of parameters . Model Parallelism Splitting up training for a model that uses a Deep Learning algorithm and a dense and/or a large number of layers. As a single GPU cannot handled it Pipe vs File Improving training times by loading data incrementally into models during training. Instead of requiring a large amount of data to be downloaded before training can start Common Issues Ø Train takes too long! We need it to take hours not days Ø Training is costing lots of money and we are not sure if all the resources are being fully utilised. Ø Our data set is too big and uses a lot of memory and network IO to process. Ø We need to train hundreds of models at the same time Ø Client teams have limited experience in orchestrion of training at scale
  • 16. Inference types ML OPS – INFERENCE TYPES Real Time ➤ Business Critical, commonly uses are chat bots, classifiers, recommenders or liner regressors. Like credit risk, journey times etc ➤ Hundred or thousands individual predictions per second ➤ API Driven with Low Latency, typically below 135ms at the 90th percentile. Near Real Time ➤ Commonly used for image classification or file analysis ➤ Hundred individual predictions per minute and processing needs to be done within seconds ➤ Event or Message Queue based, predictions are sent back or stored Occasional ➤ Examples are simple classifiers like Tax codes ➤ Only a few predictions a month and processing needs to be completed with minutes ➤ API, Event or Message Queue based, predicts sent back or stored Batch ➤ End of month reporting, invoice generation, warranty plan management ➤ Runs at Daily / Monthly / Set Times ➤ The data set is typically millions or tens of millions of rows at once Micro Batch ➤ Anomaly detection, invoice approval and Image processing ➤ Executed regularly : every x minutes or Y number of events. Triggered by file upload or data ingestion ➤ The data set is typically hundreds or thousands of rows at once Edge ➤ Used for Computer Vision, Fault Detection in Manufacturing ➤ Runs on mobile phone apps and low power devices. Uses sensors (i.e. video, location, or heat) ➤ Model output is normally sent back to the Cloud at regular intervals for analysis.
  • 17. 23 Endpoint Docker containers host the inference engines, inference engines can be written in any language and endpoints can use more than one container. Primary container needs to implement a simple REST API. Common Engines: ➤ 685385470294.dkr.ecr.eu-west-1.amazonaws.com/xgboost:1 ➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker- tensorflow:1.11-cpu-py2 ➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker- tensorflow:1.11-gpu-py2 ➤ 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow- inference:1.13-gpu ➤ 520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker- tensorflow-serving:1.11-cpu AMAZON SAGEMAKER – INFERENCE ENGINES Dockerfile: FROM tensorflow/serving:latest RUN apt-get update && apt-get install -y --no-install- recommends nginx git RUN mkdir -p /opt/ml/model COPY nginx.conf /etc/nginx/nginx.conf ENTRYPOINT service nginx start | tensorflow_model_server -- rest_api_port=8501 -- model_config_file=/opt/ml/model/models.config Container http://localhost:8080/invocations http://localhost:8080/ping Amazon SageMaker model.tar.gz Primary Container Nginx Gunicorn Model Runtime link /opt/ml/model X-Amzn-SageMaker-Custom-Attributes
  • 18. 24 Logical components of an endpoint within Amazon SageMaker AMAZON SAGEMAKER – REAL TIME INFERENCE All components are immutable, any configuration changes require new models and endpoint configurations, however there is a specific SageMaker API to update instance count and variant weight Endpoint Configuration Endpoint Inference Engine + Model Primary Container Container Container VPC S3 KMS + IAM Inference Engine + Model Primary Container Container Container VPC S3 KMS + IAM Production Variant Production Variant Model Initial Count + Weight Instance Type SDKs REST SignV4 Requests Name
  • 19. 25 The following shows same experiment with M5 Instances and autoscaling enabled: M5 INSTANCES WITH AUTOSCALING The autoscaling group was set between 2-4 instances and the scaling policy to 100k requests. The number of innovations continued to rise and CPU never went above 100%. A scaling event happen at 08:45 and took 5 minutes to warm up. No instances crashed and up to 4 instances were used.
  • 20. 26 The following chart compares the two M5 based experiments: WHY IS CPU USAGE THAT IMPORTANT? Latency(red) increased when the CPU went over 100%. The is due to invocations having to wait within SageMaker to be processed Zzzzz, Phil does sleep! The two M5 experiments had a cost of $42.96 SageMaker Studio was used instead of a SageMaker notebook instances.
  • 21. 27 The following are the four ways to deploy new versions of models in Amazon SageMaker Rolling: DEV OPS WITH SAGEMAKER Endpoint Configuration Canary Variant Full Variant Endpoint Configuration Full Variant Endpoint Configuration Full Variant Endpoint Configuration Full Variant Endpoint Configuration New Variant Old Variant Canary: Blue/Green: Linear: weight The default option, SageMaker will start new instances and then once they are healthy stop the old ones Canary deployments are done using two Variants in the Endpoint Configuration and performed over two CloudFormation updates. Requires two CloudFormation stacks and then changing the endpoint name in the AWS Lambda using an Environment Variable Linear uses two Variants in the Endpoint Configuration and using an AWS Step Function and AWS Lambda to call the UpdateEndpointWeightsAndCap acities API.
  • 23. Cost optimisation for training and inference ML OPS – A 360° Change in Instance Size Change in Instance Type No RI or Saving Plans for ML Top Tips ➤ Spot instances (surplus capacity from cloud providers) are cheaper for workloads that can handle being rerun like batch or training. For longer execution times consider using spot instances with model checkpointing. Daily Feb 20 Mar 20 Apr 20 May 20 Jun 20 Jul 20 Aug 20 Sep 20 Oct 20 Nov 20 Dec 20 Jan 21 Inference Training Notebooks Inference 57% Training 15% Notebooks 28% Monthly Yearly ➤ Models that require GPU for training justify additional consideration due to the use of more expensive instance types. ➤ For GPUs analysis of the utilization of the GPUs Cores and Memory is needed. However, CPU and Network IO all need looking at. Make sure you feed the GPUs enough data without bottlenecking ➤ Multi Model support allows for more than one model to be hosted on the same instance. This is very efficient for hosting many small models (e.g. a model per city) as hosting one per instance each would give poor resource utilisation.
  • 24. 30 Business Performance and KPIs KPIS AND MODEL MONITORING ➤ The most import measure of a model is it accomplishing what it set out to achieve ➤ This is judged by setting some clear KPIs and judging how the model affects them. ➤ This can be done a number of ways however one of the most simplest and impactful is constructing a dashboard in a BI tool like QuickSight Model Performance ➤ SageMaker Monitor can be used to base line a model and detect diff ➤ Another important aspect to monitor is that predictions are with in known boundaries ➤ Performance monitoring of the model can trigger retraining when issues arise
  • 25. AWS CloudWatch a dashboard providing complete oversight of the inference process PERFORMANCE MONITORING API error and success rates API Gateway response times using percentiles Lambda executions Availability recorded from health checker API Usage data for Usage Plan
  • 26. 32 X-RAY traces can help you spot bottlenecks and costly areas of the code including inside your models. OBSERVING INFERENCE Inference Function Inference Function Function A Function B Function C Function C Function D Function E Function F Function G Function H APIGWUrl Model Function 1 Function 2 SQL: db_url Model
  • 27. 33 Amazon SageMaker exposes metrics to AWS CloudWatch MONITORING SAGEMAKER Name Dimension Statistic Threshold Time Period Missing Endpoint model latency Milliseconds Average >100 For 5 minutes ignore Endpoint model invocations Count Sum > 10000 For 15 minutes notBreaching < 1000 breaching Endpoint disk usage % Average > 90% For 15 minutes ignore > 80% Endpoint CPU usage % Average > 90% For 15 minutes ignore > 80% Endpoint memory usage % Average > 90% For 15 minutes ignore > 80% Endpoint 5XX errors Count Sum >10 For 5 minutes notBreaching Endpoint 4XX errors Count Sum >50 For 5 minutes The metrics in AWS CloudWatch can then be used for alarms: ➤ Always pay attention to how to handle missing data ➤ Always test your alarms ➤ Look to level your alarms ➤ Make your alarms complement each other
  • 29. Using automation and tools to deploy models and to maintain consistency AUTOMATION AND PIPELINES Data Foundation Governance and Control Experiments Development Pre- Production Production Infrastructure Foundations: ➤ A solid Data Lake/Warehouse with good sources of data is required for long term scaling of ML usage ➤ Running models operationally also means considering availability, fault tolerance and scaling of instances. ➤ Having a robust security posture using multiple layers with auditability is essential ➤ Consistent architecture, development approaches and deployments aid maintainability Scaling and refinement: ➤ Did your models improve, or do they still meet, the outcomes and KPIs that you set out to affect? ➤ Has innovations in technology meant that complexity in development or deployment can be simplified? Allowing more focus to be put on other uses of ML? ➤ Are your models running on the latest and most optimal hardware? ➤ Do you need a feature store to improve collaboration and sharing of features? ➤ Do you need a model registry for control and governance?
  • 30. 36 AWS Step Functions Data Science Software Development Kit MODEL RETRAINING AWS Glue: Used for raw data ingress, cleaning that data and then transforming that data into a training data set Deployments to Amazon SageMaker endpoints: The ability to perform deployments from the pipeline, including blue/green, linear and canary style updates. AWS Lambda: Used to stitch elements together and perform any additional logic AWS ECS/Fargate: There are situations where you may need to run very long running processes over the data to prep the data for training. Lambda is not suitable for this due to its maximum execution time and memory limits, therefore Fargate is preferred in these situations. Amazon SageMaker training jobs: The ability to run training on the data that the pipeline has got ready for you
  • 31. 38 re:Invent and Webinar: ➤ https://pages.awscloud.com/GLOBAL-PTNR-OE-IPC-AIML-Inawisdom-Oct-2019-reg-event.html ➤ https://www.youtube.com/watch?v=lx9fP_4yi2s ➤ https://www.inawisdom.com/machine-learning/amazon-sagemaker-endpoints-inference/ ➤ https://www.inawisdom.com/machine-learning/machine-learning-performance-more-than-skin-deep/ ➤ https://www.inawisdom.com/machine-learning/a-model-is-for-life-not-just-for-christmas/ ➤ https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html ➤ https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alar ms-and-missing-data ➤ https://aws-step-functions-data-science-sdk.readthedocs.io/en/latest/readmelink.html#getting-started- with-sample-jupyter-notebooks REFERENCES Other: My blogs:
  • 33. 020 3575 1337 [email protected] Columba House, Adastral Park, Martlesham Heath Ipswich, Suffolk, IP5 3RE www.inawisdom.com @philipbasford