PDF Data Science On AWS Implementing End To End Continuous AI and Machine Learning Pipelines Early Edition Chris Fregly Download
PDF Data Science On AWS Implementing End To End Continuous AI and Machine Learning Pipelines Early Edition Chris Fregly Download
com
https://textbookfull.com/product/data-science-on-
aws-implementing-end-to-end-continuous-ai-and-
machine-learning-pipelines-early-edition-chris-
fregly/
https://textbookfull.com/product/ai-as-a-service-serverless-machine-
learning-with-aws-1st-edition-peter-elger/
textbookfull.com
https://textbookfull.com/product/introduction-to-eu-energy-law-first-
edition-talus/
textbookfull.com
https://textbookfull.com/product/diseases-of-poultry-14th-edition-
david-e-swayne/
textbookfull.com
https://textbookfull.com/product/wolves-of-newhaven-01-0-stolen-by-
her-wolves-1st-edition-kaylin-peyerk/
textbookfull.com
https://textbookfull.com/product/obesity-management-a-clinical-
casebook-louis-j-aronne/
textbookfull.com
Techniques for Success With Implants in the Esthetic Zone
1st Edition Arndt Happe (Editor)
https://textbookfull.com/product/techniques-for-success-with-implants-
in-the-esthetic-zone-1st-edition-arndt-happe-editor/
textbookfull.com
Data Science on AWS
Implementing End-to-End, Continuous AI and
Machine Learning Pipelines
With Early Release ebooks, you get books in their earliest form—
the author’s raw and unedited content as they write—so you can
take advantage of these technologies long before the official
release of these titles.
NOTE
S3 is Amazon’s Simple Storage Service. S3 provides a simple web
service interface that you can use to store and retrieve any amount of
data. We will discuss this service in more detail in the next chapter.
You can tell Autopilot how many model candidates to explore. In the
process of building those candidates, Autopilot tries different
algorithms and algorithm settings. Autopilot also applies all needed
data transformations to your data to optimize the input for each
algorithm. The algorithm, configuration, and data transformation
code are then combined into a single ML pipeline definition. The
most promising pipelines are selected by Autopilot and used to find
the best performing model. Lastly, Autopilot shares the results in a
model leaderboard. You can use the best performing model as a
baseline and optimize the model even further. A second option is to
simply deploy the model and start predicting.
Another highlight of Autopilot is the fact that it provides full visibility
into each of those steps and shares all code needed to reproduce
the results. AWS calls this a “white-box” approach. This white-box
approach to AutoML is very unique. Let’s explore the white-box vs.
black-box approach to AutoML a bit further.
Autopilot not only shares the models, Autopilot also logs all observed
metrics and generates Jupyter notebooks which contain the code to
reproduce the model pipelines. The Data Exploration notebook
contains the data analysis results and identifies potential data quality
issues such as missing values that might impact model performance
if not addressed. The Candidate Definition notebook highlights the
best algorithms to learn our given dataset, as well as the code and
configuration needed to use your dataset with each algorithm.
NOTE
The Jupyter notebooks are available after the first data analysis step.
You can configure Autopilot to stop after this step if you want to iterate
quickly over your data before starting the actual model tuning step.
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
Use SageMaker Experiments in Autopilot
Autopilot uses SageMaker Experiments to keep track of all data
analysis, feature engineering and model training/tuning jobs. This
feature of the broader Amazon SageMaker family of ML services
helps you organize, track, compare and evaluate machine learning
experiments. SageMaker Experiments enables model versioning and
lineage tracking across all phases of the ML lifecycle.
An experiment consists of trials and training jobs as shown in
Figure 1-5. A trial is a collection of training steps and metadata for
those steps. The training steps typically include data preprocessing,
model training, and model tuning. Metadata includes dataset
locations, algorithm hyper-parameters, output artifacts, and
performance metrics.
Figure 1-5. SageMaker Experiments.
You can explore and manage Autopilot experiments and trials either
through the UI or using the AWS Python SDK boto3
[https://github.com/boto/boto3]. Let’s have a look at both options
and start an Autopilot experiment to build a custom classifier model.
As input data, we leverage samples from the Amazon Customer
Reviews Dataset [https://s3.amazonaws.com/amazon-reviews-
pds/readme.html]. This dataset is a collection of over 150 million
product reviews on Amazon.com from 1995 to 2015. Those product
reviews and star ratings are a popular customer feature of
Amazon.com. Star rating 5 is the best and 1 is the worst. We will
describe and explore this dataset in much more detail in the next
chapters.
You will find the Autopilot UI when you click the laboratory flask icon
in the left-side menu as shown in Figure 1-9. Once there, click on
Create experiment to create and configure our first Autopilot
experiment.
Figure 1-9. Create an Autopilot experiment.
Visit https://textbookfull.com
now to explore a rich
collection of eBooks, textbook
and enjoy exciting offers!
In preparation for our Autopilot experiment, we use a subset of the
Amazon Customer Reviews Dataset to train our model. We want to
train a classifier to predict the star rating for a given review,
therefore we only use the star_rating and review_body
columns:
star_rating,review_body
5,"GOOD, GREAT, WONDERFUL, MORE THAN ADEQUATE AND EXACTLY
WHAT I NEED FOR MY COMPUTER. NONE BETTER OR GOOD TO USE."
2,"Even though it does the same job as TurboTax, it isn't as
user friendly as TurboTax. I guess you get what you paid
for."
4,"Pretty easy to use. No issues."
…
NOTE
In other scenarios, you will likely want to use more columns from your
dataset and let Autopilot choose the most important ones. In our
example, however, we want to keep things simple and just use the
star_rating and review_body columns to focus on the technical aspects
of Autopilot.
If you have a look at the S3 output bucket, you can find the
generated notebooks, code, and transformed data in the following
structure:
amazon-customer-reviews/
data-processor-models/
amazon-cus-dpp0-1-xxx/
output/model.tar.gz
Random documents with unrelated
content Scribd suggests to you:
back
back
back
back
back
back
back
back
back
back
back
back
back