0% found this document useful (0 votes)

91 views

AWS Machine Learning Specialty Master Cheat Sheet

Uploaded by

rahouiahmedzoubeir

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views

AWS Machine Learning Specialty Master Cheat Sheet

Uploaded by

rahouiahmedzoubeir

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

SKILLCERTPRO

AWS Certified Machine Learning Specialty

Master Cheat Sheet

Distributions
 PDF continuous (normal distribution)
 PMF (mass) discrete
 Poisson – series of events where the average number of successes or failure are
known. Possion = discrete
 Binomial multiple 0/1 trials
 Bernoulli = special case of binomial where we have ONE trial

Time Series
 Know the difference between Seasonality vs. Trends
 Noise is present in time series
 Additive
 Multiplicative (scales with trends)

Machine Learning Concepts

3 Categories
 Supervised – Pre-labeled data
 Unsupervised – Find groupings, clusters by itself
 Reinforcement Learning – e.g. AI for video games. Action / Reward. Learn through
trial and error.
 Have set of states, and set of actions for that state and a value Q
 Add or subtract from that value Q to weigh it
Optimization
 AKA Gradient Descent, an optimization algorithm for many ML algorithms
 Find the minima of the “sum of squares” error
 Find slopes where we become less and less steep (the gradient is closer to 0). We
take smaller and smaller steps are we get closer to a zero gradient
 We can get trapped in local minima
 Set Learning rate = step size affects how long it takes for us to reach the “minima”
Regularization
 Use when the model overfit. Prevent overfitting
 Use fewer neurons, or fewer layers
 Dropout, remove neurons at random
 Early stopping, stop say at epoch 6 instead of epoch 10
 Uses regression to compute.
 Desensitize the data to a particular dimension.

pg. 1
SKILLCERTPRO

 L1 regularization vs L2
 L1 is sum of weights, L2 is sum of weights^2
 L1 performs feature selection, some features can go to 0, can reduce dimensionality,
more computationally intensive
 L2 nothing goes to zero. Computationally efficient. Use L2 when we think all features
are important
Hyperparameters (we set before training starts) vs parameters (internal to model
that get tuned during training)
Hyperparameter
 Learning rate 0-1, step size
 Batch size, number of samples to train at a time
 Epoch, number of times we will process the training data
Cross Validation
 Don’t hold out specific records for validation
 Retrain by repartitioning and holding out a % of the data for validation each time
 k-fold cross validation
Feature Selection and Engineering
 Selecting relevant data to be trained on
 Remove unneeded data (low correlation, low variance, missing data) we don’t need
to train our model. Makes model training faster and hopefully more accurate
 Feature correlation, e.g. Age and Height
 Selection requires trial and error, and domain knowledge
 Engineering new features based on existing features. E.g. Height/age, or pulling the
weekday from a date
 PCA and K-means clustering (both unsupervised) can help reducing the feature set
Principal Component Analysis PCA
 Unsupervised learning algorithm
 Used for data preparations pre-processing, looks for relationships between data
using dimension reduction
 Find central point of all data on n-dimensional graph
 Turn that point into the origin on the graph
 Draw a minimum bounding box around all of the points
 Longest length of the box is PC1, next longest is PC2, etc.
 Take out dimensions that don’t affect the data much
 Project higher dimensional data into lower dimensional (like a 2d plot)
Missing and Unbalanced Data (Imputation)
 Impute a value that is missing, take the mean (Mean replacement of the column).
Median might work as well. But this isn’t very great TBH
 Remove the sample altogether
 Remove the column or feature altogether
 Unbalanced – outliers
 Outlier detection – random cut forests (AWS developed algorithm)
 1 – 2 std dev. Std dev = sqrt(variance). variance = (each point – mean)^2 / number
of samples
 Unbalanced – not enough examples for all of our classes
 Can create fake data “synthesize data” using expert domain knowledge to create
more examples for your class
 Actual good imputation methods
 K nearest neighbours (numerical data)
 Deep learning

pg. 2
SKILLCERTPRO

 Regression (Mice is an alg to do this) multiple imputation by chained equations

Unbalanced Data
 Discrepancy between the number of positive and negative cases
 Oversample the minority case (works ok)
 Undersample (remove) the majority cases (not that great)
 SMOTE synthetic minority oversampling technique – uses KNN to artificially
generate minority cases
 Choose a different threshold for classification, change the mix of FP and FN
Label and One Hot Encoding
 Convert labels to numbers i.e. a lookup table
 One hot encoding, used for categories – e.g. country to a number. E.g. one column
with 3 countries values become 3 columns with 0/1 True False values
Binning
 E.g. put age 20-30 into a category, 30-40 into another
 Quantile binning = all bins have same number of records in them
Splitting and Randomization
 Training data
 Validation (tune hyper-parameters)
 Testing
 Always randomize the order of the training data, to get rid of any biases we may
have introduced during the collection period
 Testing data should also be picked randomly
 Randomize before doing anything
RecordIO
 AWS format
 Pipe mode, streams data so we don’t need to submit records individually
 Faster training throughput
 Best format for SageMaker
Vanishing gradient problem (ways to tackle them)
 Multi level hierarchy – break up your NN into sub networks that are trained
individually
 LSTM
 Residual Networks (ResNet)
 ReLu activation function
Gradient Checking
 A debugging technique
 Used to numerically check the derivative values
 Used to validate NN code

ML Algorithms
Logistical Regression
 Supervised
 Binary yes no output
 Fit a sigmoid function (S shaped), less likely to be skewed by outliers

pg. 3
SKILLCERTPRO

Linear Regression
 Supervised
 Numeric output

SVMs
 Supervised
 Classification output
 Partition into groups with furthest distance
 Where’s the best line or hyperplane to separate two classes?

Decision Trees
 Supervised
 Binary, Numeric and Classification output
 Root node is one with most correlation with the label

Random Forests
 Supervised
 Binary, Numeric and Classification output
 A collection of decision trees
 DTs have a drawback, inaccuracies
 RF makes DTs more accurate
 RF picks random features an ignores the other features. Builds a DT. Repeat this
so that we get many DTs
 We run a record through all of the trees to get our result. Then we vote based on all
of the results

K-Means
 Unsupervised clustering
 Classification
 K is the number of classes we want to find
 Tries to find centre points for each cluster until we reach equilibrium
 What value of K should we use?
 Use variation, least variation wins
 Plot the reduction in variation vs. number of clusters
 This graph looks like an elbow plot. The elbow’s number of clusters is what we want

K-Nearest Neighbour
 Supervised
 Classification
 Often used after K-means

pg. 4
SKILLCERTPRO

 Used to classify new data based on clusters

 Uses K-number of nearest neighbours to classify
 E.g. k=7, classify based on the classes of the 7 nearest neighbours

Latent Dirichlet Allocation (LDA)

 Unsupervised
 Classification or Other
 Used for text analysis
 Text classification, topic discovery, document tagging, sentiment analysis
 Documents are made of Topics and made of Words
 Collection of Documents is a Corpus
 Remove stop words, “and but if”
 “Stemming” Learning, learnability -> learn
 Tokenize (turn words into an array)
 Choose the number of topics (k)

Deep Learning
Activation Functions
 Linear (can’t do backprop, no derivative). Does ‘nothing’ just outputs the input that
was given
 Binary step functions – don’t work with derivatives_|-
 We want non-linear activation functions:
 Sigmoid (Logistic)
 TanH (more widely used than sigmoid), centred around 0. Good for RNN
 ReLU rectified linear unit (looks like this _/). Very popular, fast computation
 Leaky ReLU, other variants
 Softmax – often the final output layer of classification problem. Converts outputs to
probabilities of each classification. Only outputs ONE label
 Sigmoid can output more than one label, e.g image has X and Y

Neural Networks
 Input layer, hidden layers, output layers
 Activation function inside hidden layers introduces nonlinearity,
 sigmoid (0 to 1), ReLU family (most common), Tanh (-1, 1)
 ReLU (0 to 1) piecewise, looks like: _/
 A Bias is introduced in hidden layers to prevent a 0 value, as 0* anything is 0, and
keep this neuron “active” in the network
 Adjusting bias and weights is how we tune a NN
 Forward pass
 Back propagation to optimize a loss function using Gradient descent
 Forward + backwards = 1 epoch

pg. 5
SKILLCERTPRO

Convolutional NN
 Supervised, classification
 Used when we don’t know where to find our feature, e.g. find something in an image,
find features in a sentence
 Used in image classification
 Hidden layers are convolutional layers
 A convolutional neuron does more than just an activation function
 Combine multiple filters and those calculations form the value that is output from the
neuron
 Filters can be pre trained

Recurrent NN
 Supervised, other output (time series data, voice recognition, translation)
 RNN can remember a bit, a small amount of data from past inferences
 Deals with sequences in time, or sentences, stock behavior, website logs
 Long short term memory (LSTM) and Gated Recurrent Units (GRU) are RNNs.
These solve the issue where the RNN is more biased towards more recent data
compared to earlier data
 GRU slightly less performant but trains faster as it is simpler
 LSTM more performant but more computationally expensive
 The previous input into the neuron at time 1 is an input for the next activation at time
2. It is a “memory cell”

Learning Rate
 Is a hyperparameter
 Too high, can overshoot the minima and miss the optimal solution (potentially)
 Too low means we take a ton of epochs to reach the minima (increase training
time)

Batch Size
 # of training samples used in each epoch
 Smaller batch sizes tend to not get stuck in local minima when compared to large
batch sizes
 Large batch sizes can converge on the wrong solution at random

Model Performance and Optimization

Confusion Matrix
 Visualize testing on models
 Which model should we use? We need to test different models

pg. 6
SKILLCERTPRO

 Good – True positives, correctly predict position from positive

 Good – True negatives
 Bad – FP, FN
 Compare confusion matrix that are generated from different algorithms
 Can be 3×3, 4×4 etc.

Sensitivity (Recall) and Specificity

 Sensitivity = true positive rate = recall = TP/(TP+FN), closer to 1 is better, less false
negatives
 Specificity = true negative rate = TN/(TN+FP), closer to 1 is better, less false
positives
 False positives more acceptable in your business case? E.g. marking non fraud case
is OK as long as we can catch all fraud = High sensitivity. Or medical, making a
false positive is OK, as we can follow up to check.
 High specificity, classify video content so that we don’t allow child to watch adult
content. False positives are unacceptable. We prefer false negatives, i.e. video is
suitable but we don’t allow it
 Sensitivity = cast a net, I want to catch ALL THE FISH in the lake. I don’t care if I
catch frogs, bugs etc as long as I get all of the fish
 Specificity = cast a net, I ONLY want to catch fish, I don’t want my net to have other
things in it, but I’m ok with only catching some, and not all of the fish in the lake

Accuracy and Precision

 Accuracy = how right am I overall = (TP + TN) / Total, 1 may indicate overfit
 Precision = proportion of actual positives that were identified (subset of accuracy) =
TP/(TP+FP) (all positives in the calc)

ROC/AUC
 How to set threshold, or cutoff point for sensitivity vs specificity
 Build lots of confusion matrices and graph Sensitivity vs (1- specificity)
 It’s a graph from 0 to 1
 That line is receiver operating characteristics. Look for the knee points
 It allows us to find the best model for max specificity and max sensitivity
 AUC is area under curve is how well this model performs. More area is better, max
area is 1. More AUC means this model is better
 ROC – balance between sensitivity and specificity
 AUC – compare different models in terms of their separation power. 0.5 is useless as
its the diagonal line. 1 is perfect

Gini Impurity
 Information gain algorithm to see how to create the first node and make the best split
in a decision tree
 1 – (probability of class A)^2 – (probability of other)^2

pg. 7
SKILLCERTPRO

 Then calculate a weighted avg using the total numbers and the two gini impurities
 Repeat for all of the features that exist
 Compare the weighted gini impurity. Lowest is better. It best separates the classes.

F1 Score
 Often used as a replacement for accuracy
 F1 combines recall and precision, takes into account more than accuracy
 2/(1/recall + 1/precision) or 2TP/(2TP+FP+FN)
 Higher is better
 Use when we care about both precision and recall

TF-IDF
 Term frequency and Inverse Document Frequency
 What terms are most relevant for a document
 Term Frequency – how often a word appears in a doc
 Document Frequency – how often a word appears in ALL DOCS (this can let us get
rid of words used everywhere like “and”, “But”)
 TF / DF or TF * IDF. IDF = 1/DF
 Unigrams, Bigrams, n-grams
 I love certification exams
 unigrams = every work individually
 Bi-grams = every two consecutive words “i love” “love certification” “certification
exams”

Ensemble Learning
 Bagging – generate new training sets with random sampling with replacement
 Boosting – training sets will have weights, and as we retrain the weights will change
 Boosting generally has better accuracy, Bagging avoids overfitting

Machine Learning Tools and Frameworks

Jupyter
 Sagemaker is Jupyter as a service
 Runs boxed environments with separated dependencies

ML and DL frameworks
 Keras is an easier way to access Tensorflow (Google)
 AWS is MXNet and Gluon. Gluon is an abstraction of MXNet (like Keras)
 Pytorch and Scikit learn

pg. 8
SKILLCERTPRO

Tensorflow
 Graph object is like an array that you populate with code lines from top to bottom
 You store constants, variables with no assigned value yet (placeholder), into the
default graph of tensorflow
 You can add operations like multiply, add to the graph sequentially
 You can run the graph using Sessions

PyTorch
 Create a tensor which is a multi dimensional array with zeros
 You can just do simple operations like add, multiple on those matrices directly
 You need to add requires_grad=True to add memory to store the order of operations
that were done for back propagation purposes
 Graph is created on the fly

MXNet
 Graph is created on the fly
 Similar to PyTorch, turn on the autograd feature to get backprop

Scikit Learn
 Has lots of built in datasets to use
 Support for less popular models. The other focus more on neural nets

Pandas
 Dataframe = table
 Manipulate data

AWS
S3
 Use as a “Data lake”
 Kinesis->S3->Athena->SageMaker
 Athena perform queries against S3 using SQL
 AWS Glue service
 Security and encryption

Glue

pg. 9
SKILLCERTPRO

 Glue Data Catalog

 Metadata repository for datatables in S3
 Integrates with Athena and Redshift
 Glue Crawler
 Can crawl your data and discover the schema
 Can figure out folder partitions
 First create the crawler
 requires an IAM role to access resources in AWS
 The crawler can run on demand or a repeated basis
 Glue ETL
 Has bundled ‘’transformations’ like dropnulls, dropfield, and a ML one called
“FindMatches” which can detect duplicate records
 Transform data from 3 different sources: databases OBDC, S3, or dynamodb
 Frequently used as an input to Athena
 We can’t access the actual data from Glue, for that we need to query it in Athena
 Glue vs. Data Pipelines: DP is an orchestration service, it doesn’t do the actual ETL
for you
 Glue is completely serverless

Athena
 SQL interface into S3 data lakes
 Works on many different formats, json, csv, parquet etc
 Save query results back into S3 – preprocess data before ML
 After running a query, a csv file is created in a auto-generated bucket which stores
the query result
 Can save queries
 Serverless
 Can create tables or views from queries
 Works well with Glue Catalog

Amazon Quicksight
 Visualize data sources, end user targeted. Should use federated auth
 Not AWS service really
 Create dashboards, reports, graphs
 Quicksight can natively connect to many AWS areas for data use

Kinesis: Data Streams, Data Firehose, Video Streams

and Data Analytics
 Ingest lots of data real time, large datasets, iot
 Firehose is a delivery/ingestion service to send into some permanent storage,
one of 4 places: S3, Redshift, Elasticsearch, Splunk (or kinesis data analytics)
 Fully managed NEAR real time
 Can auto convert CSV and JSON to Parquet or ORC when destination is S3 (using
Glue)

pg. 10
SKILLCERTPRO

 Can compress gzip, zip, snappy when destination is S3

 Serverless means you must use Lamdba to do data conversions 0 there are a bunch
of blueprints we can use
 Data streams is generic endpoints (consuming applications), it can stream into
EMR/Spark, Lambda, Kinesis Data Analytics. Requires setting up shards. More
shards = more capacity, ~1mb/sec/shard. Real time
 Can write custom code for producer and consumer – create real time applications
 Data Analytics – Process Kinesis data Streams or Firehose using SQL, Flink, Java
 Select from Stream1…
 Input stream is Kinesis data stream or firehose
 Has output stream and error streams
 For streaming ETL
 SQl can create from template
 Remember: can do SQL function RANDOM CUT FOREST for anomaly detection on
columns in a stream
 HOTSPOT, detect dense areas in your data
 Firehose, can take in a never-ending stream of json, dump it into S3

EMR with Spark (Elastic Map Reduce)

 Managed Hadoop service for parallel compute tasks
 e.g. massive data sets that we need to normalize or transform before ML
 Spark is kind of a better “Mapreduce” for EMR
 Petabyte scale
 Integrated with S3 – we can use S3 as a mounted filesystem instead of HDFS
(hadoop filesystem) through”EMRFS”
 Has a Master node, core nodes and task nodes inside the cluster
 Master node manages cluster
 Each node plays a different role
 Core nodes store data on hadoop filesystem
 Task nodes can be spot instances (optional node) as they don’t store data on
filesystem
 Apache spark is a analytics engine, runs in an EMR cluster
 Spark runs in SageMaker and SparkML runs in EMR too
 Typical workflow, S3 -> EMR/Spark -> SageMaker

EC2 for ML
 Use computer optimized or Accelerated Compute (GPU instances)
 There’s a class of ml.* instances but those are only available from SageMaker
 Lots of AMIs preloaded with machine learning languages and libraries
 Conda libraries
 Bas AMIs with GPU libraries
 You must request limit increases to use any ML suitable compute instances

Batch
 Docker images, serverless

pg. 11
SKILLCERTPRO

 Cloudwatch and step functions can trigger Batch

Data Engineering Summary

Here’s a quick summary of all the services we’ve mentioned

 Amazon S3: Object Storage for your data

 VPC Endpoint Gateway: Privately access your S3 bucket without going through the
public internet
 Kinesis Data Streams: real-time data streams, need capacity planning, real-time
applications
 Kinesis Data Firehose: near real-time data ingestion to S3, Redshift, ElasticSearch,
Splunk
 Kinesis Data Analytics: SQL transformations on streaming data
 Kinesis Video Streams: real-time video feeds
 Glue Data Catalog & Crawlers: Metadata repositories for schemas and datasets in
your account
 Glue ETL: ETL Jobs as Spark programs, run on a serverless Spark Cluster
 DynamoDB: NoSQL store
 Redshift: Data Warehousing for OLAP, SQL language
 Redshift Spectrum: Redshift on data in S3 (without the need to load it first in
Redshift)
 RDS / Aurora: Relational Data Store for OLTP, SQL language
 ElasticSearch: index for your data, search capability, clickstream analytics
 ElastiCache: data cache technology
 Data Pipelines: Orchestration of ETL jobs between RDS, DynamoDB, S3. Runs on
EC2 instances
 Batch: batch jobs run as Docker containers – not just for data, manages EC2
instances for you
 DMS: Database Migration Service, 1-to-1 CDC replication, no ETL
 Step Functions: Orchestration of workflows, audit, retry mechanisms
Briefly mentioned:

 EMR: Managed Hadoop Clusters

 Quicksight: Visualization Tool
 Rekognition: ML Service
 SageMaker: ML Service
 DeepLens: camera by Amazon
 Athena: Serverless Query of your data

Built in AWS ML Services

Rekognition
 Image and video analysis (object and scene detection)
 Pretrained deep learning
 Image moderation

pg. 12
SKILLCERTPRO

 Facial analysis – age, gender, smiling etc

 Celebrity recognition
 Face comparison – can we find a face from this image in a target image
 Text in image (e.g. signs text)
 Video can be streamed or stored somewhere such as S3
 Stored video – S3 -> Lambda triggered on upload-> recognition
 Streaming video – Kinesis video stream -> recognition -> Kinesis data stream

Polly
 Text to speech
 Many languages, Female or male voices
 Upload a lexicon to customize pronunciation (read full acronyms)
 Can pass text in, in SSML format “speech synthesis markup language”, looks like
XML. Like you can put in whisper effects, pauses

Transcribe
 Speech to Text (ASR – Automatic speech recognition)
 Real time or analyse pre-recorded files
 You can create custom vocabularies
 Put words in a text file, specify the language and upload it (or put it into S3)
 You can create transcription jobs
 Speaker identification

Translate
 Batch or real time
 Supports custom terminology you can pass in dictionaries in csv or tmx format

Comprehend
 Text analysis (NLP)
 Can train on our own data
 Features
 Keyphrase extraction
 Sentiment analysis (=ve, -ve, neutral, mixed)
 Syntax analysis (separate into pronouns, verbs, adjectives etc)
 Entity recognition (names, organizations, dates)
 Custom entities
 Language detection
 Custom classification (provide training data)
 Topic modelling
 Multi language support

Lex

pg. 13
SKILLCERTPRO

 Powers Alexa
 Conversation interface service for chatbots
 Tries to understand intent from your speech
 Create a “bot”, can output voice or output text
 Then create Utterances (training data) and “Intent” (labels)

Forecast
 Time series forecasting

Service Chaining with AWS Step Functions

 Combining multiple AWS services to create a full solution out of the ML services
 S3 for storage, trigger Lambda
 Translate => Comprehend
 Aws step function orchestrates multiple lambda functions together.
 State machine
 Step functions can pause or wait, e.g poll a service for status (for asynchronous
services). Lambda has execution time limits, that we can get around using step
functions
 It’s logic between lambda functions

Sagemaker
 Build train and deploy ML models (3 stages of Sagemaker)
 Fully managed service
 End to end lifecycle of ML
 Lots of managed algorithms, we just choose hyperparameters
 Can access/control Sagemaker using the console (web), API (boto3), Python SDK
and Jupyter notebooks
 Notebooks
 Have Notebook instance types, with ml. prefix
 You can still spin up other instances, you’re not tied to your Notebook Instance Type
 You can give access to S3 buckets
 You don’t have access to VPC by default unless you set them
 You access the notebook instance through a presigned url
 Lifecycle configurations are used to run bash commands that run before your
notebook instance starts

Sagemaker Build

Data Preprocessing
 Visualize your data (notebooks)
 Explore data
 Feature engineering

pg. 14
SKILLCERTPRO

 Synthesize (generate more training data for certain labels if we have less cases)
 Convert, e.g. images to recordIO, csv into something else
 Split data (validate, test, train)
 Structure

Amazon Ground Truth

 Build training datasets
 Reduce data labelling costs
 When we have data but it is not labelled
 Workflow includes humans to label (Mechanical Turk)
 Can also be private human team (internal to your company)
 You provide instructions to tell people how to label

Preprocessing image Data

 We don’t have enough images, only 60 of a certain class, what can we do?
 Rotate and transform the images of that class to generate more training data, e.g.
sharpen, colour contrasts

Algorithms
 3 sources for Sagemaker
 Built-in to Sagemaker
 AWS Marketplace
 Custom
 Linear Learner
 Can do regression and classification
 Input: RecordIO (preferred), CSV. Inputs can be pipe (faster) or file mode
 Must normalize data first
 BlazingText
 Text classification, sentiment analysis, etc
 Used by Amazon Comprehend probably
 2 modes: Word2vec or text classification
 Object2Vec
 turn objects into features
 Unsupervised, figure out similarity between objects
 Image Classification Alg (object detection is bounding boxes)
 Conv NN
 Image recognition (possibly powers Amazon Rekognition)
 Can use “transfer learning” i.e. build on an existing model
 K-Means
 Web scale k-means clustering algorithm
 Find discrete groupings within data (unsupervised algorithm)
 Latent Dirichlet Allocation (LDA)
 Text analysis, topic discovery
 Unsupervised
 Amazon Comprehend

pg. 15
SKILLCERTPRO

 Principal Component Analysis (PCA)

 Reduce dimensionality (number of features)
 XGBoost
 Extreme gradient boosting – high performance decision tree algorithms
 Boosted group of decision trees
 Use to make predictions from tabular data
 Not deep learning algorithm
 Lots of hyperparameters
 Seq2Seq
 RNNs, input is a sequence of tokens and output is the same
 Machine translation, text summarization
 DeepAR
 Forecasting 1d time series data
 RNN
 Random Cut Forest
 Unsupervised
 Anomaly detection
 There’s a lot more algorithms built in to sagemaker

Sagemaker Train

Architecture
 ECS + docker images
 Can create our own images
 docker image structure /opt/ml/code, opt/ml/model
 S3 (Training data) – or elastic file system, or FSx for Lustre
 Has “Channels” that need to be defined e.g. train, validation, model
 Channel tells what kind of data this is?
 EC2 instances (ML class) – we can’t get into the OS of these
 P2 family is GPU
 Sometimes can elastically attach GPUs to an instance
 There’s “spot instances” for training called “managed spot training”
 Can keep state using “checkpoints” in s3 if your instance is destroyed. It stops
gracefully

Training an Image Classifier

 Create “training job” from sagemaker
 Requires a role, e.g. to get and write data from s3 buckets where our training data is
 Choose an algorithm (aka which ecr container)
 File input vs pipe input
 Accuracy metric are published into Cloudwatch
 Choose instance size. Some algorithms WILL require you to use GPU instances
 Max execution time
 VPCs
 Hyperparameters. A lot of defaults are set for us, but some require us to fill in e.g.
 Number of classes (neurons in output layer)

pg. 16
SKILLCERTPRO

 Number of training samples

 Image dimensions, colour channels
 Input data configuration
 Channel name (train, validate, training labels, validation labels)
 Location (e.g. S3 or file system)
 Output data configuration
 S3 location

Hyperparameter Tuning
 Sagemaker auto parameter tuning as a service
 Choose an algorithm -> Set ranges of hyperparameters -> Choose metric to
measure (e.g. maximize area under curve)
 Sagemaker will run a whole bunch of training models in parallel. There is a “tuning
model” looking at the hyperparams

Sagemaker Deploy

Inference Pipelines
 Chaining models together
 Pass output of one model to be used as input to another model

Real-Time and Batch Inference

 Real-time inference has a SageMaker Endpoint (internal not public)
 We can call a model in real time to get the result (inference) by InvokeEndpoint from
EC2, Lambda
 Batch inference – Create a “batch transform job” likely with data from S3. Push that
S3 into the batch job, and then the output goes back into S3

Deploy
 Create a model definition
 Choose a IAM role, pass in the “training image” (ECR docker container) from the
“Training Job”, the model S3 location
 Create an Endpoint configuration
 Point it to the model definition
 Create the Endpoint
 Name, choose the endpoint configuration
 We use the Endpoint to make inferences. Endpoints can’t be accessed
publicly. You can access it from Lambda, or CLI, with the “aws sagemaker-runtime
invoke-endpoint….” Command
 After invoking, the output is probably just an array of numbers, labels, whatever
 Accessing SageMaker Endpoints from an App
 AWS api/sdk à SageMaker Endpoint is one way
 API Gateway à Lambda à SageMaker Endpoint is another

pg. 17
SKILLCERTPRO

Security
SageMaker Notebooks
 IAM policy – CreatePresignedNotebookInstanceUrl –
 Give notebook root access (server access)? Set this during creation default is true .
Lifecycle scripts run as root.
 SageMaker instance profiles, e.g. to grant permissions to S3
 SageMaker doesn’t support resource-based policies e.g. an S3 bucket policy
 From Notebooks, we can see S3 buckets, and files, but we can’t copy them by
default

SageMaker VPCs
 The default is a public VPC i.e. access to internet
 If we are in a private VPC, we need a S3 VPC endpoint to access S3

Other
 Horovod or parameter servers – how to do distributed training in tensorflow
 Production variants – how to do a/b (% traffic to A, % to B) testing of models using
production data
 Amazon NEO is a cross compiler that lets you use models in different architectures

AWS Machine Learning-Specialty Summary:

 AWS course page: https://aws.amazon.com/certification/certified-machine-learning-

specialty/
 AWS exam guide: https://d1.awsstatic.com/training-and-certification/docs-
ml/AWS%20Certified%20Machine%20Learning%20-
%20Specialty_Exam%20Guide%20(1).pdf
 AWS sample questions: https://d1.awsstatic.com/training-and-certification/docs-
ml/AWS%20Certified%20Machine%20Learning%20-
%20Specialty_Sample%20Questions.pdf

pg. 18
SKILLCERTPRO

Domain 1: Data Engineering (20%)

Create data repositories for machine learning. Identify and implement a data-
ingestion solution. Identify and implement a data-transformation solution. Opinion:
IMHO this domain should be reduced to 15% or even 10%. I found the questions
pretty repetitive, and they were about Big Data, not about Machine Learning. If
you’ve already passed the Big Data Specialty certification, you’ll be fine. If not, make
sure you’re very familiar with Kinesis and its different flavours, or you’ll have a
miserable time.

Domain 2: Exploratory Data Analysis (24%)

Sanitize and prepare data for modeling. Perform feature engineering. Analyze and
visualize data for machine learning. Opinion: typical Data Science stuff, not really tied
to any particular AWS service. Cleaning data, handling missing values, performing
basic feature engineering. If you have hands-on ML experience, this won’t be a
problem at all. Questions don’t go very deep. I was surprised to get a few questions
on data viz, most of them pretty vague and awkward to answer without looking at
any actual data. IMHO they should be dropped and replaced with more questions on
feature engineering.

Domain 3: Modeling (36%)

Frame business problems as machine learning problems. Select the appropriate

model(s) for a given machine learning problem. Train machine learning models.
Perform hyperparameter optimization. Evaluate machine learning models. Opinion: a
reasonable mix of high-level questions on framing business problems (algo selection,
etc.), SageMaker-related questions (built-in algos, HPO, etc.) and Deep Learning
questions (CNN, LSTM, regularization, etc.). Again, if you do this for a living and if
you’ve spent some time with SageMaker, you should be fine. I didn’t get any
complex algorithm question, and none on specific Deep Learning frameworks
(TensorFlow, etc.). IMHO, this could be a little more challenging than it is :)

Domain 4: Machine Learning Implementation and Operations (20%)

Build machine learning solutions for performance, availability, scalability, resiliency,

and fault tolerance. Recommend and implement the appropriate machine learning
services and features for a given problem. Apply basic AWS security practices to
machine learning solutions. Deploy and operationalize machine learning solutions.

Query logging using Athena Cloudtrail integration with Athena Amazon Macie
Glacier Vault lock Quicksight Different e-mail protocols in secure
port https://www.siteground.com/tutorials/email/protocols-pop3-smtp-imap/

pg. 19
SKILLCERTPRO

White paper
 https://d1.awsstatic.com/whitepapers/Security/AWS_Security_Whitepaper.pdf
 https://d1.awsstatic.com/whitepapers/aws-kms-best-practices.pdf
 https://d0.awsstatic.com/whitepapers/compliance/AWSSecurityatScaleLogginginAWS
_Whitepaper.pdf
 https://d1.awsstatic.com/whitepapers/architecture/AWS-Security-Pillar.pdf
 https://d1.awsstatic.com/whitepapers/Security/DDoS_White_Paper.pdf
 https://d1.awsstatic.com/whitepapers/Security/Secure_content_delivery_with_CloudFr
ont_whitepaper.pdf
 https://d0.awsstatic.com/whitepapers/compliance/AWS_Security_at_Scale_Logging_in
_AWS_Whitepaper.pdf

 AWS Sagemaker Developer Guide

 https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.htmlWong

 Tai Sin's recommended Sagemaker build-in9 big algorithm video

 Learning Path and AWS university from AWS's official website:

 https://aws.amazon.com/training/learning-paths/machine-learning/exam-
preparation/

Machine learning

pg. 20
SKILLCERTPRO

pg. 21
SKILLCERTPRO

pg. 22
SKILLCERTPRO

Model parameter :

Parameters are those which would be learned by the machine like Weights and
Biases.

Hyperparameter :

Hyper-parameters are those which we supply to the model, for example: number of
hidden Nodes and Layers,input features, Learning Rate, Activation Function etc in
Neural Network,

Diffrence between model param vs HyperParam

 https://machinelearningmastery.com/difference-between-a-parameter-and-a-
hyperparameter/

Learning Rate :

Determines the size of the step taken during gradient descent optimization ,Between
0 and 1

Batch Size :

 The number of sample used to train at any one time.

 Could be all one or some of your data (batch ,stochastic ,or mini-batch)
 Often 32 ,64 and 128

pg. 23
SKILLCERTPRO

 Calculable from infrastructure

Epoches :

 The number of times that the algorithm will process the entire training data.
 Each epoch contains one or more batches
 Each epoch should see the model get closer to the desired state
 Usually a high number :
 10,100,1000 and up

Disclaimer: All data and information provided on this site is for informational
purposes only. This site makes no representations as to accuracy, completeness,
currentness, suitability, or validity of any information on this site & will not be
liable for any errors, omissions, or delays in this information or any losses,
injuries, or damages arising from its display or use. All information is provided on
an as-is basis.

pg. 24

DP-203 Exam - Free Actual Q&Ans - ExamTopics
No ratings yet
DP-203 Exam - Free Actual Q&Ans - ExamTopics
270 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Data Mining CS4168 Lecture 5 Basics of Classification 1
No ratings yet
Data Mining CS4168 Lecture 5 Basics of Classification 1
25 pages
Sagar Akunuri Sr. Python Developer
No ratings yet
Sagar Akunuri Sr. Python Developer
5 pages
AAM UT-2 QB ANS
No ratings yet
AAM UT-2 QB ANS
29 pages
ML Modelling - part 1
No ratings yet
ML Modelling - part 1
7 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
ML
No ratings yet
ML
9 pages
ML imppp (1)
No ratings yet
ML imppp (1)
12 pages
Intro To Data Science Summary
No ratings yet
Intro To Data Science Summary
17 pages
1737527078055
No ratings yet
1737527078055
111 pages
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
No ratings yet
ML Concepts: 1. Parametric Vs Non-Parametric Models:: Examples: Linear, Logistic, SVM
34 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Handling The Dataset Using R - Word
No ratings yet
Handling The Dataset Using R - Word
54 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
TC-1 Final Answer Key
No ratings yet
TC-1 Final Answer Key
14 pages
ML Notes
No ratings yet
ML Notes
15 pages
SL_DT
No ratings yet
SL_DT
25 pages
Unit-7 ML
No ratings yet
Unit-7 ML
11 pages
1.0 Modeling: 1.1 Classification
No ratings yet
1.0 Modeling: 1.1 Classification
5 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Data Science Interview Quesions
No ratings yet
Data Science Interview Quesions
22 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
Unit Test 1 Need of DS: Data Structure
No ratings yet
Unit Test 1 Need of DS: Data Structure
23 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
ML Unit 3 Part 3
No ratings yet
ML Unit 3 Part 3
33 pages
FMLanswerkey-IT 2.docx (1) (1) (1)
No ratings yet
FMLanswerkey-IT 2.docx (1) (1) (1)
11 pages
Machine Learning Lab: Delhi Technological University
No ratings yet
Machine Learning Lab: Delhi Technological University
6 pages
SECTION 1: Basic Concepts and Notations, Arrays and Recursion
No ratings yet
SECTION 1: Basic Concepts and Notations, Arrays and Recursion
6 pages
L05 Slides.mlp2
No ratings yet
L05 Slides.mlp2
21 pages
Aiml K2
No ratings yet
Aiml K2
8 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Deep Learning
100% (1)
Deep Learning
49 pages
Class i Fiers
No ratings yet
Class i Fiers
24 pages
ML QA
No ratings yet
ML QA
10 pages
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
ML U2 Notes
No ratings yet
ML U2 Notes
12 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
DSM_MOd_5
No ratings yet
DSM_MOd_5
34 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
4.3-DecisionTreesLearningAlgorithms Part 2
No ratings yet
4.3-DecisionTreesLearningAlgorithms Part 2
15 pages
ML Unit 2
No ratings yet
ML Unit 2
41 pages
ML3 Unit 4-3
No ratings yet
ML3 Unit 4-3
13 pages
K-MEANS-FINAL
No ratings yet
K-MEANS-FINAL
10 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Gradient Descent Algorithm
No ratings yet
Gradient Descent Algorithm
5 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
som-new
No ratings yet
som-new
21 pages
Unit - 2 ML notes
No ratings yet
Unit - 2 ML notes
14 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Unsupervised Learning Final
No ratings yet
Unsupervised Learning Final
17 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Updated_AAM_QB_(1)[1]
No ratings yet
Updated_AAM_QB_(1)[1]
6 pages
Cheat Sheet Final
100% (2)
Cheat Sheet Final
7 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
Atlassian - L3-L4 - All Location - Big Data Spark Scala or Python AWS
No ratings yet
Atlassian - L3-L4 - All Location - Big Data Spark Scala or Python AWS
2 pages
20+ Key Difference in Spark
No ratings yet
20+ Key Difference in Spark
9 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
54 pages
Python - How To Transform Spark Dataframe To Polars Dataframe - Stack Overflow
No ratings yet
Python - How To Transform Spark Dataframe To Polars Dataframe - Stack Overflow
6 pages
EasyChair Preprint 8693
No ratings yet
EasyChair Preprint 8693
22 pages
John Doe Resume
No ratings yet
John Doe Resume
1 page
Mastering Apache Spark
100% (2)
Mastering Apache Spark
1,831 pages
Learn Well Technocraft: Hadoop/Big Data Syllabus
No ratings yet
Learn Well Technocraft: Hadoop/Big Data Syllabus
12 pages
Hands-On Learning With KubeFlow + Keras - TensorFlow 2.0 + TF Extended
No ratings yet
Hands-On Learning With KubeFlow + Keras - TensorFlow 2.0 + TF Extended
1 page
[FREE PDF sample] Mastering Large Datasets with Python Parallelize and Distribute Your Python Code 1st Edition John T Wolohan ebooks
100% (2)
[FREE PDF sample] Mastering Large Datasets with Python Parallelize and Distribute Your Python Code 1st Edition John T Wolohan ebooks
62 pages
Azure Cosmos DB Workshop
100% (1)
Azure Cosmos DB Workshop
147 pages
Spark Training - Java
No ratings yet
Spark Training - Java
8 pages
BDS Course Handout - Intuit PDF
No ratings yet
BDS Course Handout - Intuit PDF
6 pages
Azure Data Engineering - Pragathi
No ratings yet
Azure Data Engineering - Pragathi
4 pages
Data Science Meritshotv2-Brochure
No ratings yet
Data Science Meritshotv2-Brochure
20 pages
Data Engineering Cookbook
100% (2)
Data Engineering Cookbook
127 pages
AlexElizaveta DevelopingBeamIO
No ratings yet
AlexElizaveta DevelopingBeamIO
22 pages
Guimarães, Lucas C. B. Rebello, Gabriel Antonio F. Camilo, Gustavo F. de Souza, Lucas Airam C. Duarte, Otto Carlos M. B. (2022)
No ratings yet
Guimarães, Lucas C. B. Rebello, Gabriel Antonio F. Camilo, Gustavo F. de Souza, Lucas Airam C. Duarte, Otto Carlos M. B. (2022)
16 pages
Module-1-Introduction To BigData Platform
No ratings yet
Module-1-Introduction To BigData Platform
21 pages
BigData&Analytics Module6
No ratings yet
BigData&Analytics Module6
23 pages
Ramesh Ratnam
No ratings yet
Ramesh Ratnam
3 pages
Bigdata .Profile
No ratings yet
Bigdata .Profile
3 pages
Data Analytics Basics: A Beginner's Guide
No ratings yet
Data Analytics Basics: A Beginner's Guide
15 pages
Harsh - Data Engineer
No ratings yet
Harsh - Data Engineer
8 pages
MSc Scheme and Syllabus 2024-2025AB-All Semesters
No ratings yet
MSc Scheme and Syllabus 2024-2025AB-All Semesters
57 pages
Microsoft Testking AI-100 v2020-03-04 by Iwei 82q
No ratings yet
Microsoft Testking AI-100 v2020-03-04 by Iwei 82q
67 pages
Data Engineering With Databricks
No ratings yet
Data Engineering With Databricks
11 pages
Amazon, Data Engineer I - Interview Experience.
No ratings yet
Amazon, Data Engineer I - Interview Experience.
3 pages

AWS Machine Learning Specialty Master Cheat Sheet

Uploaded by

AWS Machine Learning Specialty Master Cheat Sheet

Uploaded by

SKILLCERTPRO

AWS Certified Machine Learning Specialty

Machine Learning Concepts

 Regression (Mice is an alg to do this) multiple imputation by chained equations

 Used to classify new data based on clusters

Latent Dirichlet Allocation (LDA)

Model Performance and Optimization

 Good – True positives, correctly predict position from positive

Sensitivity (Recall) and Specificity

Accuracy and Precision

Machine Learning Tools and Frameworks

 Glue Data Catalog

Kinesis: Data Streams, Data Firehose, Video Streams

 Can compress gzip, zip, snappy when destination is S3

EMR with Spark (Elastic Map Reduce)

 Cloudwatch and step functions can trigger Batch

Data Engineering Summary

 Amazon S3: Object Storage for your data

 EMR: Managed Hadoop Clusters

Built in AWS ML Services

 Facial analysis – age, gender, smiling etc

Service Chaining with AWS Step Functions

Amazon Ground Truth

Preprocessing image Data

 Principal Component Analysis (PCA)

Training an Image Classifier

 Number of training samples

Real-Time and Batch Inference

AWS Machine Learning-Specialty Summary:

 AWS course page: https://aws.amazon.com/certification/certified-machine-learning-

Domain 1: Data Engineering (20%)

Domain 2: Exploratory Data Analysis (24%)

Domain 3: Modeling (36%)

Frame business problems as machine learning problems. Select the appropriate

Domain 4: Machine Learning Implementation and Operations (20%)

Build machine learning solutions for performance, availability, scalability, resiliency,

 AWS Sagemaker Developer Guide

 Tai Sin's recommended Sagemaker build-in9 big algorithm video

 Learning Path and AWS university from AWS's official website:

Diffrence between model param vs HyperParam

 The number of sample used to train at any one time.

 Calculable from infrastructure

You might also like