Sharp J. Exam Ref AI-900 Microsoft Azure AI Fundamentals 2022 PDF
Sharp J. Exam Ref AI-900 Microsoft Azure AI Fundamentals 2022 PDF
Julian Sharp
Exam Ref AI-900 Microsoft Azure AI Fundamentals
Published with the authorization of Microsoft Corporation by:
Pearson Education, Inc.
All rights reserved. This publication is protected by copyright, and permission must be obtained from
the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in
any form or by any means, electronic, mechanical, photocopying, recording, or likewise. For
information regarding permissions, request forms, and the appropriate contacts within the Pearson
Education Global Rights & Permissions Department, please visit www.pearson.com/permissions.
No patent liability is assumed with respect to the use of the information contained herein. Although
every precaution has been taken in the preparation of this book, the publisher and author assume no
responsibility for errors or omissions. Nor is any liability assumed for damages resulting from the use
of the information contained herein.
ISBN-13: 978-0-13-735803-8
ISBN-10: 0-13-735803-2
ScoutAutomatedPrintCode
TRADEMARKS
Microsoft and the trademarks listed at http://www.microsoft.com on the “Trademarks” webpage are
trademarks of the Microsoft group of companies. All other marks are property of their respective
owners.
SPECIAL SALES
For information about buying this title in bulk quantities, or for special sales opportunities (which
may include electronic versions; custom cover designs; and content particular to your business,
training goals, marketing focus, or branding interests), please contact our corporate sales department
at [email protected] or (800) 382-3419.
EDITOR-IN-CHIEF
Brett Bartow
EXECUTIVE EDITOR
Loretta Yates
SPONSORING EDITOR
Charvi Arora
DEVELOPMENT EDITOR
Songlin Qiu
MANAGING EDITOR
Sandra Schroeder
COPY EDITOR
Sarah Kearns
INDEXER
Timothy Wright
PROOFREADER
Donna Mulder
TECHNICAL EDITOR
Francesco Esposito
EDITORIAL ASSISTANT
Cindy Teeters
COVER DESIGNER
Twist Creative, Seattle
COMPOSITOR
codeMantra
GRAPHICS
codeMantra
Pearson’s Commitment to Diversity,
Equity, and Inclusion
Pearson is dedicated to creating bias-free content that reflects the diversity
of all learners. We embrace the many dimensions of diversity, including but
not limited to race, ethnicity, gender, socioeconomic status, ability, age,
sexual orientation, and religious or political beliefs.
Education is a powerful force for equity and change in our world. It has the
potential to deliver opportunities that improve lives and enable economic
mobility. As we work with authors to create content for every product and
service, we acknowledge our responsibility to demonstrate inclusivity and
incorporate diverse scholarship so that everyone can achieve their potential
through learning. As the world’s leading learning company, we have a duty
to help drive change and live up to our purpose to help more people create a
better life for themselves and to create a better world.
Our ambition is to purposefully contribute to a world where:
While we work hard to present unbiased content, we want to hear from you
about any concerns or needs with this Pearson product so that we can
investigate and address them.
Index
Contents
Introduction
Organization of this book
Preparing for the exam
Microsoft certifications
Quick access to online references
Errata, updates, & book support
Stay in touch
Index
Acknowledgments
I’d like to thank the following people without whom this book would not
have been possible.
Thank you to Loretta and Charvi for your patience and encouragement
with this project. To the various editors for your correcting my errors. To
Francesco for the quality of the review and your helpful suggestions. To
Andrew Bettany for originally recommending me for this book and also for
involving me in your Cloud Ready Graduate program, where I learned how
to refine the content and the messaging around AI Fundamentals for a
wider audience, making this a much better book as a result.
About the author
JULIAN SHARP is a solutions architect, trainer, and Microsoft Business
Applications MVP with over 30 years of experience in IT. He completed his
MA in Mathematics at the University of Cambridge. Julian has spoken at
Microsoft Ignite and many other community events. For the past 15 years,
he has been a Microsoft Certified Trainer delivering certification training
around Dynamics 365, Azure, and the Power Platform. He has taught
thousands of students with a high pass rate. Julian has a passion for
Artificial Intelligence to enhance user experience and customer data in the
solutions that he designs.
Introduction
Microsoft certifications
Microsoft certifications distinguish you by proving your command of a
broad set of skills and experience with current Microsoft products and
technologies. The exams and corresponding certifications are developed to
validate your mastery of critical competencies as you design and develop,
or implement and support, solutions with Microsoft products and
technologies both on-premises and in the cloud. Certification brings a
variety of benefits to the individual and to employers and organizations.
Stay in touch
Let’s keep the conversation going! We’re on Twitter:
http://twitter.com/MicrosoftPress.
CHAPTER 1
Cognitive Services A set of prebuilt services that you can easily use
in your applications.
Azure Bot Service A service to help create and deploy chatbots and
intelligence agents.
Azure Machine Learning A broad range of tools and services that
allow you to create your own custom AI.
Storage
Compute
Web Apps
HD Insights
Data Factory
Cosmos DB
Azure Functions
Azure Kubernetes Service (AKS)
Example ML architecture
To explain how Azure services support Azure Machine Learning, consider
the scenario of a company that wants to provide recommendations to its
users. By providing personalized targeted recommendations, users will
more likely purchase more of their products and user satisfaction will
increase.
Figure 1-1 shows an example of an ML architecture to support
recommendations.
For each model type, you will need to understand the following:
Image processing
Natural Language Processing (NLP)
QnA Maker A tool to build a bot using existing support and other
documentation.
Azure Bot Service Tools to build, test, deploy, and manage bots.
Both QnA Maker and the Azure Bot Service leverage the Language
Understanding (LUIS) service in Cognitive Services.
We will look at bots in more detail in Chapter 5.
EXAM TIP
You will be asked in the exam to pick the correct AI workload from
an example given, or to identify an example for a specified AI
workload. You should make sure you understand the use cases for
each workload.
Prediction
Prediction, or forecasting, is where the computer identifies patterns in your
historical data, and through machine learning associates the patterns with
outcomes. You can then use the prediction model to predict the outcome for
new data.
Types of predictions include the following:
Binary prediction There are two possible outcomes for a question—
yes/no or true/false.
Multiple outcome prediction The question can be answered from a
list of two or more outcomes.
Numerical prediction The question is answered with a continuous
number, not explicitly limited to a specific range of values.
Anomaly detection
Anomaly detection analyzes data over time and identifies unusual changes,
often for real-time data streams.
Anomaly detection, also known as outlier detection, can find dips and
spikes that may indicate a potential issue. Such issues are hard to spot when
analyzing aggregate data, as the data points are hidden in the vast volume
of data.
Anomaly detection can identify trend changes. Typically, the anomaly
will indicate problems such as a sticking valve, payment fraud, a change in
the level vibration on a bearing, or errors in text.
Anomaly detection enables pre-emptive action to be taken before a
problem becomes critical or adversely affects business operations.
NOTE ANOMALY DETECTION
Anomaly detection does not predict when a failure will occur; for
this, you should use a prediction model.
There are several algorithms that can be used for anomaly detection.
The Azure Anomaly Detector service selects the best algorithm based on
the data, making Anomaly Detector easy to use with a single API call.
Anomaly Detector can also run in a Docker container, so it can be deployed
at the edge on devices themselves.
Following are some examples of the use of anomaly detection:
Computer vision
Computer vision is the processing of still images and video streams.
Computer vision can interpret the image and provide detail and
understanding about the image in computer-readable form.
Computer vision can determine if the image contains a specific object
(object detection) and can extract details from the image, such as colors or
text.
Computer vision can:
Describe an image
Categorize an image
Tag an image
Detect objects
Detect faces
Identify brands and products
Identify famous people
Identify landmarks
Extract text
Knowledge mining
Knowledge mining is the process of extracting key insights from structured
and unstructured data sources.
Knowledge mining uses a combination of AI services, including Azure
Cognitive Search, to extract meaning and relationships from large amounts
of information. This information can be held in structured and unstructured
data sources, documents, and databases. Knowledge mining uncovers
hidden insights in your data.
Microsoft provides a Knowledge Mining Solution Accelerator to help
you ingest different data and document source, enrich, and index the data,
and provides a user interface to explore the results.
Conversational AI
Conversational AI is the process of building AI agents to take part in
conversations with humans. Conversational AI is commonly experienced by
humans as chatbots on websites and other systems.
AI agents (bots) engage in conversations (dialogs) with human users.
Bots use natural language processing to make sense of human input,
identify the actions the human wants to perform, and identify the entity on
which the actions are to be performed. Bots can prompt the human for the
information required to complete a transaction.
There are three common types of bot that you may encounter:
Webchat
Telephone voice menus (IVR)
Personal Digital Assistants
EXAM TIP
Make sure you can describe each principle of Responsible AI in a
single sentence.
Following are some examples where the Fairness principle can have a
significant impact:
Rigorous testing
Works as expected
Eliminates threat of harm to human life
Autonomous vehicles
Healthcare diagnosis
Respect privacy.
Be secure.
Avoid disclosing personal data.
Empowering everyone
Engaging all communities in the world
Intentionally designing for the inclusivity principle
Governance framework
Ethical policies
Legal standards
Thought experiment
In this thought experiment, demonstrate your skills and knowledge of the
topics covered in this chapter. You can find the answers in the section that
follows.
You work for Contoso Medical Group (CMG), and your management is
interested in using AI in your applications and operations. CMG manages
and monitors drug trials, evaluating the efficacy of the treatments.
The CMG IT department is resource-constrained, and they do not have
data scientists or skilled AI developers available.
Having timely and accurate responses from patients improves the
accuracy of the analysis performed. CMG has created an app to capture and
track a patient’s daily symptoms. CMG has recently added the capability of
the app to take pictures to capture skin conditions. CMG is unable to
analyze the images due to the volume of images being captured. CMG is
concerned about the amount of data storage for these images, as well as
controlling access to the images.
CMG receives a lot of patient history and prescription records that are
keyed into CMG’s computer systems. These paper records are important
information used to track a patient’s response to drugs and treatments.
The support department is unable to handle the many inquiries CMG
receives. Customers are receiving inconsistent responses depending on
whom they speak to and how they are accessing customer support, whether
by phone, web, or mobile app.
Your manager has come to you asking for solutions that address these
issues. Whatever solution you offer must consider that the medical data in
this application is covered under HIPAA, and your manager wants CMG to
retain all control of the data. Your manager also wants to carefully control
costs.
You have decided that CMG can use AI, but there are several issues that
you need to resolve before proceeding.
Answer the following questions:
1. Which AI workload should you use for the customer support
department?
2. Which principle of Responsible AI should you employ to gain the trust
of users in your bot?
3. Which AI workload should you use to analyze the images for skin
conditions?
4. How can you address the storage requirements for the images?
5. Which principle of Responsible AI protects a patient’s personal
information?
6. Which AI workload could identify adverse reactions to a drug
treatment?
7. Which principle of Responsible AI requires rigorous testing of your
AI-based app?
You can probably see that there is a pattern that shows studying more
hours leads to a higher exam score and passing the exam. However, can you
see a pattern between the students’ academic backgrounds and whether they
pass or fail, and can you answer the question of how much does completing
the labs affect their score? What if you were to have more information
about the student, and what if there were many more records of data? This
is where machine learning can help.
Supervised learning
In supervised learning, the existing data contains the desired outcome. In
machine learning, we say that the data contains a label. The labeled value is
the output we want our model to determine for new data. A label can either
be a value or a distinct category.
The other data that is supplied and that is used as inputs to the model are
called features. A supervised learning model uses the features and label to
train the model to fit the label to the features. After the model is trained,
supplying the model with the features for new data will predict the value, or
category, for the label.
You use supervised learning where you already have existing data that
contains both the features and the label.
Unsupervised learning
In unsupervised learning, we do not have the outcome or label in the data.
We use machine learning to determine the structure of the data and to look
for commonalities or similarities in the data. Unsupervised learning
separates the data based on the features.
You use unsupervised learning where you are trying to discover
something about your data that you do not already know.
Reinforcement learning
Reinforcement learning uses feedback to improve the outcomes from the
machine learning model. Reinforcement learning does not have labeled
data.
Reinforcement learning uses a computer program, an agent, to
determine if the outcome is optimal or not and feeds that back into the
model so it can learn from itself.
Reinforcement learning is used, for example, in building a model to play
chess and is commonly used in robotics.
EXAM TIP
For this exam, you do not need to know about the different
algorithms, but you must be able to differentiate between the
different learning models, regression, classification, and clustering.
We can see that the classification model correctly predicts all but two of
the results. If the model predicts a pass and the actual is a pass, this is a true
positive. If the model predicts a fail and the actual is a fail, this is a true
negative.
In a classification model, we are interested where the model gets it
wrong. For student L, the model predicts a pass, but the actual result was a
fail—this is a false positive. Student E actually passed, but the model
predicted that the student will fail—this is a false negative.
EXAM TIP
How to extract and transform data is outside the scope of this exam.
We will start with the same dataset we used earlier in this chapter with
some additional exam results, as shown in the table in Figure 2-8.
Identify labels
If you are using supervised training—for example, a regression or a
classification model—then you need to select the label(s) from your dataset.
Labels are the columns in the dataset that the model predicts.
For a regression model, the Score column is the label you would choose,
as this is a numeric value. Regression models are used to predict a range of
values.
For a classification model, the Pass column is the label you would
choose as this column has distinct values. Classification models are used to
predict from a list of distinct categories.
Feature selection
A feature is a column in your dataset. You use features to train the model to
predict the outcome. Features are used to train the model to fit the label.
After training the model, you can supply new data containing the same
features, and the model will predict the value for the column you have
selected as the label.
The possible features in our dataset are the following:
Background
Hours Studied
Completed Labs
In the real world, you will have other possible features to choose from.
Feature selection is the process of selecting a subset of relevant features
to use when building and training the model. Feature selection restricts the
data to the most valuable inputs, reducing noise and improving training
performance.
Feature engineering
Feature engineering is the process of creating new features from raw data
to increase the predictive power of the machine learning model. Engineered
features capture additional information that is not available in the original
feature set.
Examples of feature engineering are as follows:
Aggregating data
Calculating a moving average
Calculating the difference over time
Converting text into a numeric value
Grouping data
Models train better with numeric data rather than text strings. In some
circumstances, data that visually appears to be numeric may be held as text
strings, and you need to parse and convert the data type into a numeric
value.
In our dataset, the background column, the degree subject names for our
students, may not perform well when we evaluate our model. One option
might be to classify the degree subjects into humanities and sciences and
then to convert to a Boolean value, such as IsScienceSubject, with values of
1 for True and 0 for False.
Bias
Bias in machine learning is the impact of erroneous assumptions that our
model makes about our data. Machine learning models depend on the
quality, objectivity, and quantity of data used to train it. Faulty, incomplete,
or prejudicial data can result in a poorly performing model.
In Chapter 1, we introduced the Fairness principle and how an AI model
should be concerned with how data influences the model’s prediction to
help eliminate bias. You should therefore be conscious of the provenance of
the data you are using in your model. You should evaluate the bias that
might be introduced by the data you have selected.
A common issue is that the algorithm is unable to learn the true signal
from the data, and instead, noise in the data can overly influence the model.
An example from computer vision is where the army attempted to build a
model that was able to find enemy tanks in photographs of landscapes. The
model was built with many different photographs with and without tanks in
them. The model performed well in testing and evaluation, but when
deployed, the model was unable to find tanks. Eventually, it was realized
that all pictures of tanks were taken on cloudy days, and all pictures
without tanks were taken on sunny days. They had built a model that
identifies whether a photograph was of a sunny or a cloudy day; the noise
of the sunlight biased the model. The problem of bias was resolved by
adding additional photographs into the dataset with varying degrees of
cloud cover.
It can be tempting to select all columns as features for your model. You
may then find when you evaluate the model that one column significantly
biases the model, with the model effectively ignoring the other columns.
You should consider removing that column as a feature if it is irrelevant.
Normalization
A common cause of bias in a model is caused by data in numeric features
having different ranges of values. Machine learning algorithms tend to be
influenced by the size of values, so if one feature ranges in values between
1 and 10 and another feature between 1 and 100, the latter column will bias
the model toward that feature.
You mitigate possible bias by normalizing the numeric features, so they
are on the same numeric scale.
After feature selection, feature engineering, and normalization, our
dataset might appear as in the table in Figure 2-9.
FIGURE 2-9 Normalized dataset
Training The training dataset is the sample of data used to train the
model. It is the largest sample of data used when creating a machine
learning model.
Testing The testing, or validation, dataset is a second sample of data
used to provide a validation of the model to see if the model
can correctly predict, or classify, using data not seen before.
The algorithm finds patterns and relationships in the training data that
map the input data features to the label that you want to predict. The
algorithm outputs a machine learning model that captures these patterns.
Training a model can take a significant amount of time and processing
power. The cloud has enabled data scientists to use the scalability of the
cloud to build models more quickly and with more data than can be
achieved with on-premises hardware.
After training, you use the model to predict the label based on its
features. You provide the model with new input containing the features
(Hours Studied, Completed Labs) and the model will return the predicted
label (Score or Pass) for that student.
The length of the lines indicates the size of residual values in the model.
A model is considered to fit the data well if the difference between actual
and predicted values is small.
The following metrics can be used when evaluating regression models:
Mean absolute error (MAE) Measures how close the predictions are
to the actual values; lower is better.
Root mean squared error (RMSE) The square root of the average
squared distance between the actual and the predicted values; lower is
better.
Relative absolute error (RAE) Relative absolute difference between
expected and actual values; lower is better.
Relative squared error (RSE) The total squared error of the
predicted values by dividing by the total squared error of the actual
values; lower is better.
Mean Zero One Error (MZOE) If the prediction was correct or not
with values 0 or 1.
Coefficient of determination (R2 or R-squared) A measure of the
variance from the mean in its predictions; the closer to 1, the better the
model is performing.
We will see later that Azure Machine Learning calculates these metrics
for us.
True Positive The model predicted true, and the actual is true.
True Negative The model predicted false, and the actual is false.
False Positive The model predicted true, and the actual is false.
False Negative The model predicted negative, and the actual is true.
The total number of true positives is shown in the top-left corner, and
the total number of true negatives is shown in the bottom-right corner. The
total number of false positives is shown in the top-right corner, and the total
number of false negatives is shown in the bottom-left corner, as shown in
Figure 2-12.
From the values in the confusion matrix, you can calculate metrics to
measure the performance of the model:
Accuracy The number of true positives and true negatives; the total
of correct predictions, divided by the total number of predictions.
Precision The number of true positives divided by the sum of the
number of true positives and false positives.
Recall The number of true positives divided by the sum of the
number of true positives and false negatives.
F-score Combines precision and recall as a weighted mean value.
Area Under Curve (AUC) A measure of true positive rate over true
negative rate.
All these metrics are scored between 0 and 1, with closer to 1 being
better.
We will see later that Azure Machine Learning generates the confusion
matric and calculates these metrics for us.
If you are used to tools such as PyCharm or Jupyter notebooks, you can
use these within Azure and leverage other Azure services such as compute
and storage.
If you are used to frameworks such as PyTorch, Scikit-Learn,
TensorFlow, or ONNX, you can use these frameworks within Azure.
If you are used to using Apache Spark, you can use Azure Databricks,
Microsoft’s implementation of Apache Spark that integrates tightly with
other Azure services.
If you are used to Microsoft SQL Server machine learning, you can use
an Azure Data Science virtual machine (DSVM), which comes with ML
tools and R and Python installed.
You can also configure and use your own virtual machine configured
using the tools and services that you may prefer.
As there are many different options available, we will focus on the
native Azure services provided by Microsoft.
You will need to select the subscription, resource group, and region
where the resource is to be deployed. You will then need to create a unique
name for the workspace. There are four related Azure services:
Storage account The default datastore for the workspace for data
used to train models as well as files used and generated by the
workspace.
Key vault Securely stores secrets such as authentication keys and
credentials that are used by the workspace.
Application insights Stores monitoring information for your
deployed models.
Container registry Stores Docker images used in training and
deployments.
You can either select existing resources or create new resources for
these related services.
Clicking on Review + create will validate the options. You then click on
Create to create the resource. The resource will be deployed after a few
minutes.
Once your resource has been created, you can view the resources
associated with the workspace, as shown in Figure 2-16.
To access the workspace, you need to open the workspace in the portal.
This will display the details of the workspace, as shown in Figure 2-17.
Before you can start working with your workspace, you need to assign
compute resources to the workspace. These are referred to as compute
targets:
NOTE COMPUTE
You need compute targets to run machine learning workloads in
the cloud instead of on your local computer. You can perform
model development on local compute with low volumes of data
using Visual Studio Code.
The high-level process for an Azure Machine Learning workspace is
shown in Figure 2-18.
EXAM TIP
For this exam, you do not need to know how to write code for
machine learning, but you do need to know about the different
languages and tools for building models.
Machine Learning studio allows you to create and manage the assets in
your Machine Learning workspace using a graphical user interface.
Author
Azure Machine Learning studio supports both no-code and code-first
experiences. You can build, train, and run machine learning models with
Automated Machine Learning, Notebooks, and a Visual drag-and-drop
designer.
Azure Machine Learning studio supports the use of Jupyter notebooks
that use the Python SDK to create and run machine learning models.
Automated Machine Learning (AutoML) is a no-code tool that performs
many of the steps required to build and train a model automatically,
reducing the need for deep machine learning skills and domain knowledge.
You just select the training data and the required model type, and AutoML
determines the best algorithm to use and trains the model.
The Designer (drag-and-drop ML) is a no-code tool that allows you to
build pipelines for data preparation and model creation.
The AutoML and Designer tools are explained in Skill 2.4 later in this
chapter.
Compute
Before you can start ingesting data or building a model, you must first
assign a compute instance. A compute instance is a configured development
virtual machine environment for machine learning. A compute instance is
used as a compute target for authoring and training models for development
and testing purposes.
Clicking on Compute in the left-hand navigation pane of Azure Machine
Learning studio displays the Compute options, as shown in Figure 2-21.
FIGURE 2-21 Machine Learning studio compute
After clicking on the + New button, the Create compute instance pane
opens, as shown in Figure 2-22.
FIGURE 2-22 Create a compute instance
You will need to provide a unique name for the instance; select the
machine type and size. You then click on Create to create the virtual
machine. The virtual machine will be created and started after a few
minutes. You will be able to see the state of the virtual machine, as shown
in Figure 2-23.
FIGURE 2-23 Compute instances
You can stop the virtual machine used for the compute instance by
clicking on the stop button, as shown in Figure 2-24.
Ingestion
Each Machine Learning workspace has two built-in datastores: one for data
used for training and evaluating models and another for files used by
machine learning, such as logs and output files.
Figure 2-25 shows the built-in datastores in the Machine Learning
workspace.
When you click on the + New datastore button, you can add in the
following existing datastores:
If you have existing data in Azure SQL database, you supply the details
of the Azure SQL database, as shown in Figure 2-26.
FIGURE 2-26 Add Azure SQL database
If you want to import your data into the Azure Machine Learning
workspace, you register datasets using Machine Learning studio. You can
create a dataset from:
When you import data, you must define the dataset type as either tabular
or file:
You must enter the name of the dataset, select the dataset type as either
Tabular or File. Clicking on Next displays the next step in the wizard, as
shown in Figure 2-28.
FIGURE 2-28 Create a dataset from the local files wizard, step 2
You select the datastore where the data will be imported into. This will
normally be the built-in blob datastore, but you can add your own datastore
if required. You can then upload a single file or select an entire folder of
files to upload. Clicking on Next will parse the selected file(s) and display
the next step in the wizard, as shown in Figure 2-29.
FIGURE 2-29 Create a dataset from the local files wizard, step 3
You can review the tabular data to ensure it has been parsed correctly.
Clicking on Next displays the next step in the wizard, as shown in Figure 2-
30.
FIGURE 2-30 Create a dataset from the local files wizard, step 4
You can exclude columns from being imported and correct any data type
for each column. Clicking on Next displays the final step in the wizard, as
shown in Figure 2-31.
FIGURE 2-31 Create a dataset from the local files wizard, step 5
Clicking on Create will import the data and register the dataset, as
shown in Figure 2-32.
FIGURE 2-32 Registered datasets
You can use publicly available data from URLs such as daily bike rental
data that can be found at http://aka.ms/bike-rentals. You can import this
dataset, as shown in Figure 2-33.
FIGURE 2-33 Create a dataset from the local files wizard, step 1
You can simply add these datasets to your workspace. To find out more
about these and other publicly available datasets, see
https://docs.microsoft.com/azure/open-datasets/dataset-catalog.
Preparation
Once you have ingested your data, you will need to prepare your data for
training and testing models. A good place to start is to explore the profile of
the data. In Machine Learning studio, you can generate and review the
profile of your data, as shown in Figure 2-35.
How you perform these actions will depend on the tool you choose to
build and train your model.
Data that is incomplete can cause issues when training your model. You
can clean missing data by methods such as the following:
Feature selection
Feature selection is the process of selecting a subset of the columns in the
dataset features to exclude features that are not relevant to the machine
learning problem that we are trying to resolve. Feature selection restricts the
data to the most valuable inputs, reducing noise and improving training
performance.
Feature selection has two main purposes:
Increase the model’s ability to classify data accurately by eliminating
features that are irrelevant, redundant, or highly correlated.
Increase the efficiency of the model training process.
If you can reduce the number of features without losing the variance and
patterns in the data, the time taken to train the model is minimized. For
instance, you can exclude features that are highly correlated with each other
as this just adds redundancy to the processing.
Azure Machine Learning has a module for Filter-Based Feature
Selection to assist in identifying features that are irrelevant. Azure Machine
Learning applies statistical analysis to determine which columns are more
predictive than the others and ranks the results. You can exclude the
columns that have a poor predictive effect.
Feature engineering
Feature engineering is the process of creating new features from raw data
to increase the predictive power of the machine learning model. Engineered
features capture additional information that is not available in the original
feature set. Examples of feature engineering are aggregating data,
calculating a moving average, and calculating the difference over time.
It can be beneficial to aggregate data in the source data, reducing the
amount of data imported and used to train the model.
In Azure Machine Learning, you can use modules such as feature
hashing that use Cognitive Services to turn text into indices. For example,
in our student data, we could apply feature hashing to convert the academic
subject for each student into a numeric hash value.
Binning is an example of feature engineering where the data is
segmented into groups of the same size. Binning is used when the
distribution of values in the data is skewed. Binning transforms continuous
numeric features into discrete categories.
A common example of feature engineering is around dates and times.
You can convert dates and times to the relative number of days, hours, and
minutes or take two datetime columns and creating the difference in
minutes between. This might create a model that predicts more accurate
outcomes.
Another similar example of feature engineering is to extract features
from a single column, such as converting a date into the following data:
The daily bike rentals data shows how such features have been
engineered.
NOTE PIPELINES
Pipelines are more flexible and are not limited simply to training
and can include other steps, such as data ingestion and
preparation.
Compute cluster
Before you can train a model, you need to assign compute resources to your
workspace. A compute cluster is used to train models with the Azure
Machine Learning workspace. The cluster can also be used for generating
predictions on large amounts of data in batch mode.
Clicking on Compute in the left-hand navigation pane displays the
Compute options, as shown previously in Figure 2-21. Clicking the
Compute clusters tab displays any existing cluster. After clicking on the +
New button, the Create compute cluster pane opens, as shown in Figure 2-
36.
You will need to select the machine type and size. Clicking on Next
displays the cluster settings pane, as shown in Figure 2-37.
FIGURE 2-37 Create compute cluster settings
You will need to provide a unique name for the cluster and specify the
minimum and maximum number of nodes in the cluster. You then click on
Create to create the cluster. The cluster will be created after a few minutes.
You will be able to see the state of the cluster, as shown in Figure 2-38.
You can change the number of nodes used in the compute cluster by
changing the minimum and maximum number of nodes, as shown in Figure
2-39.
If you are not training models, you can set both numbers to zero.
Training
Training a model requires a combination of dataset, algorithm, and
compute. Azure Machine Learning allows you to easily train a model, but
in the real world, you need to run multiple experiments with different
features and algorithms to build the best performing model.
Before training, you should have transformed and cleansed your dataset,
selected features, performed any feature engineering required, and
normalized the data. By properly preparing your dataset, you can improve
the accuracy of the trained model.
Azure Machine Learning logs metrics for every experiment, and you can
use these metrics to evaluate the model’s performance.
Once you have deployed your model, training does not stop. You may
get additional data, and you will need to train and evaluate the model. As
was discussed earlier, building models is an iterative process. But it is not
just iterative—it is an ongoing process. With Azure Machine Learning, you
can create new versions of your model and then deploy a version to replace
the existing deployed model. You can also revert to an older version of the
model if the version you have deployed does not perform well with new
data.
Many algorithms have parameters, known as hyperparameters, that you
can set. Hyperparameters control how model training is done, which can
have a significant impact on model accuracy. Azure Machine Learning has
a module that allows for tuning hyperparameters by iterating multiple times
with combinations of parameters to find the best fit model
With the K-means clustering algorithm, you can adjust the K, which is
the target number of clusters you want the model to find. Increasing K
increases the compute time and cost.
Scoring
Azure Machine Learning contains a module that can split data into training
and testing datasets. Once the model has been trained, you use the testing
dataset to score the model.
The testing dataset contains data that was not used to train the model—
data that is new to the model. You use the trained model to generate
predictions for each row in the testing dataset. Azure Machine Learning
enables visualization of the results from scoring, as shown in Figure 2-40
for a regression model using our student dataset and using the hours studied
as a single feature and the score as the label.
In the results, you can see that the prediction (the scored label) for the
first two rows is close to the actual score, but the other two rows are
showing higher residual errors. The third row has a predicted score of 649,
but the actual score was 500; the fourth row has a predicted score of 670,
but the actual score was 623.
Figure 2-41 shows the scores for a classification model using our
student dataset and using the hours studied and completed labs as features
and pass as the label.
FIGURE 2-41 Score results for classification
In the results, you can see that the prediction (the scored label) for the
first three rows is correct. The first row is a true negative, with the actual
and prediction both false with a probability of 50%. The second row is a
true positive, with the actual and prediction both true with a probability of
100%. The third row is a true negative, with the actual and prediction both
false, but with a probability of only 37.5%. The fourth row is a false
positive, with the actual being a fail but the model predicting a pass.
Once you have scored your model, you use the scores to generate
metrics to evaluate the model’s performance and how good the predictions
of the model are.
The sequence of the tasks when building a model is shown in Figure 2-
42.
As you can see from these metrics, this model does not perform well
with high error values and a low coefficient of determination. We will need
to select additional features and train the model to see if we can create a
better performing model.
Figure 2-44 shows the metrics for a classification model using our
student dataset and using the hours studied and completed labs as features
and pass as the label.
FIGURE 2-44 Metrics for a classification model
As you can see from these metrics, with a threshold set to 50%, the
model is only 50% accurate with a precision of 33%. We will need to add
additional data, use a different algorithm or perform feature engineering,
and train the model to see if we can create a better performing model.
Earlier in the chapter, we discussed bias in the dataset, where the
algorithm is unable to separate the true signal from the noise. This is often
caused by the dataset used for training. If we have used a split of the same
dataset for scoring and evaluating, then the model may appear to perform
well but does not generalize—in other words, it does not predict well with
new unseen data.
This is one of the most common problems with machine learning and is
known as overfitting. Overfitting means that the model does not generalize
well from training data to unseen data, especially data that is unlike the
training data. Common causes are bias in the training data or too many
features selected, meaning the model cannot distinguish between the signal
and the noise.
One way to avoid overfitting is to perform cross validation. In cross
validation, a dataset is repeatedly split into a training dataset and a
validation dataset. Each split is used to train and test the model. Cross-
validation evaluates both the dataset and the model, and it provides an idea
of how representative the dataset is and how sensitive the model is to
variations in input data.
For batch inference processing, you create a pipeline that includes steps
to load the input data, load the model, predict labels, and write the results to
a datastore, normally to a file or to a database.
For real-time processing, the model is deployed as a web service that
enables applications to request via http.
If you have used a pipeline to build your model, you can create an
inference pipeline that performs the same steps for new data input, not the
sample data used in training. You can publish the inference pipeline as a
web service. A real-time inference pipeline must have at least one Web
Service Input module and one Web Service Output module. The Web
Service Input module is normally the first step in the pipeline. The pipeline
performs the same steps for new data input. The Web Service Output
module is normally the final step in the pipeline.
In Azure Machine Learning, when you publish a real-time inferencing
model, you deploy the model as a web service running in a Docker
container. There are three ways you can deploy the container:
You can also use ONNX (Open Neural Network Exchange) to export
your model and deploy it on other platforms, such as on a mobile device.
Automated ML trades data science skills with compute power and time.
A data scientist will evaluate and choose the features, algorithm, and
hyperparameters for the model. Automated ML simply tries all
combinations of algorithm, features, and parameters and then ranks the
results by the chosen metric.
Clicking on Automated ML in the left-hand navigation pane in the
Azure Machine Learning studio lists all previous ML runs and allows you
to create a new run. Clicking on + New Automated ML run opens the
Create a new Automated ML run pane, as shown in Figure 2-46.
FIGURE 2-46 Create a new Automated ML run – select dataset
EXAM TIP
Automated ML only supports supervised learning, as you must
specify a label for the model to fit the data to.
Clicking Next displays the final step in the wizard, as shown in Figure
2-48.
This summary page for the Automated ML run shows the status of the
run and the best model. Clicking on the Models tab allows you to review
each of the models and evaluate the metrics, as shown in Figure 2-51.
FIGURE 2-51 Automated ML models
You can use the information shown in the explanations to interpret the
model and build responsible models.
You can select the model and deploy it from within Automated ML to
either Azure Container Instances or to Azure Kubernetes Service. This will
register the model and deploy it as a web service.
In the left pane of the designer, you see the datasets in the workspace
and all modules that you can add to the designer canvas. In the middle pane
of the designer, you can see the workflow for the pipeline, with a dataset as
the first step at the top and the final evaluate step at the bottom. In the right-
hand pane, you can see the settings for the pipeline, including the compute
cluster selected.
Although you can use the designer as a no-code tool, you can also insert
steps into the workflow to run Python and R scripts.
Clicking on Submit will run the pipeline. As the pipeline is running, the
currently active step is highlighted, so you follow the progress of the
pipeline run, as shown in Figure 2-55.
FIGURE 2-55 Pipeline in progress
You can click on a step and see its properties. For example, the Split
Data step properties show the mode for splitting the data and the proportion
to split, as shown in Figure 2-56.
FIGURE 2-56 Step properties
When the pipeline run has completed, you can view the metrics
generated for the model by right-clicking on the Evaluate Model step, as
shown in Figure 2-57.
FIGURE 2-57 Evaluation metrics
Datasets has been expanded in the left-hand pane, showing four datasets
that include the student exam results dataset we registered earlier in this
chapter. Training pipelines always starts with a dataset. You can drag your
chosen dataset and drop it onto the canvas, as shown in Figure 2-59.
FIGURE 2-59 Dataset added to designer canvas
Before training, data often must be cleansed. There are many modules
available to transform and cleanse data. If you have numeric features, you
should normalize them. You need to drag the Normalize Data tile onto the
canvas. Next, you need to drag the output from the bottom of the dataset
step to the top of the normalize step, as shown in Figure 2-60.
After training, a Score Model module is used to score the testing dataset
from the Split Data step, as shown in Figure 2-62.
FIGURE 2-62 Score Model added to designer canvas
The Score Model module is required before you can evaluate the model.
The Score Model module is added between the Train Model and Evaluate
Model steps in the pipeline. Score Model has two inputs, one from the
Train Model and the other from Split Data. You take the second output
from the Split Data step and connect to the second input on the Score
Model.
The testing data includes the actual label values. We can compare the
actual values to the predicted labels generated in the Score Model step and
use the Evaluate Model module to calculate the metrics to help us
determine the performance, or accuracy, of the model. The complete
pipeline is shown in Figure 2-63.
FIGURE 2-63 Evaluate Model added to designer canvas
Clicking on the Register model button displays a dialog with the name
of the model. You can see the models in the Machine Learning studio, as
shown in Figure 2-66.
FIGURE 2-66 Model list
To use the model, you can either clone the pipeline and add in steps for
batch processing or create a real-time inference pipeline. For a real-time
inference pipeline, you need to replace the dataset with a Web Service Input
module, remove the evaluate steps, and replace with a Web Service Output
module, as shown in Figure 2-67.
FIGURE 2-67 Web Service Input and Output in a real-time inference
pipeline
You can select the model and deploy it from within Azure Machine
Learning studio to either Azure Container Instances or to Azure Kubernetes
Service. This deploys the model as a web service.
Chapter summary
In this chapter, you learned some of the general concepts related to
Artificial Intelligence. You learned about the feature of common AI
workloads, and you learned about the principles of responsible AI. Here are
the key concepts from this chapter:
Machine learning is the basis for modern AI.
Machine learning uses data to train models.
Feature is the name for the data used as inputs to a model.
Label is the name for the data that the model predicts.
Supervised learning requires data with features and labels to train the
model.
Unsupervised learning trains with data without labels.
Regression models predict a value along a range of values.
Classification models predict a discrete class or category.
Regression and classification models are both examples of supervised
learning.
Clustering models group data by similarities in the data into discrete
categories.
Clustering is an example of unsupervised learning.
Feature selection is the process of selecting a subset of relevant
features for training a machine learning model.
Feature engineering is the process of creating new features from raw
data to increase the predictive power of the machine learning model.
Normalization is a common process in data preparation that changes
all numeric data to use the same scale to prevent bias.
You split your dataset into a training dataset and a testing dataset.
Training a model combines the training dataset with an algorithm.
After training is complete, you score and then evaluate the model.
Scoring is performed using the testing dataset. Scoring compares the
actual values with the predicted values from the model.
Evaluation generates metrics that determine how well the model
performs.
The metrics for regression models calculate the difference between the
actual and predicted values using averages and means.
The metrics for classification models measure whether the prediction
is a true or false positive or negative. A confusion matrix represents
the number of true/false positive/negative predictions. The metrics for
classification models are ratios of these predictions.
You need an Azure subscription to use Machine Learning on Azure.
You need to create an Azure Machine Learning workspace to build and
train models.
The Azure Machine Learning workspace manages assets associated
with machine learning, including datastores, datasets, experiments,
models, and compute.
Compute instances support data preparation and building of models.
Compute clusters support training of models.
You can deploy a model to Azure Container Instances or Azure
Kubernetes Service.
Azure Machine Learning studio can be used to author models using
Python with Jupyter notebooks, using Automated ML, or using the
drag-and-drop designer.
.NET is not supported in Azure Machine Learning studio.
You can build and manage models using Visual Studio Code with
.NET and Python.
Automated ML uses the power of the cloud to create multiple models
for the same data using different algorithms and then determining
which is the best fit model.
Azure Machine Learning designer is a graphical user interface that
allows you to drag and drop modules into a workflow to build a model.
Thought experiment
In this thought experiment, demonstrate your skills and knowledge of the
topics covered in this chapter. You can find the answers in the section that
follows.
You work for Relecloud, an internet service provider that collects large
volumes of data about its customers and the services they use.
You are evaluating the potential use of machine learning models to
improve business decision making and in business operations.
Relecloud uses Azure for virtual machines but has no experience of
machine learning in Azure.
You need to advise on the types of machine learning models to build and
the languages and tools that you should use within your organization.
You also need to explain to IT what Azure resources are required to be
provisioned to use machine learning in Relecloud’s applications.
Answer the following questions:
1. Which machine learning type should you use to determine if a social
media post has positive or negative sentiment?
2. Which type of machine learning groups unlabeled data using
similarities in the data?
3. Which two datasets do you split your data into when building a
machine learning model?
4. Which metrics can you use to evaluate a regression machine learning
model?
5. Which metrics can you use to evaluate a classification machine
learning model?
6. Which compute target should you use for development of machine
learning models?
7. What is used to increase the predictive power of a machine learning
model?
8. What should you do to prevent your model from being biased by one
feature?
9. What should you do to enable a trained machine learning model to be
measured for accuracy?
10. What should you do after training your model prior to deploying your
model as a web service?
11. You use Automated ML to find the best model for your data. Which
option should you use to interpret and provide transparency for the
model selected?
12. Which two types of datasets can you register and use to train
Automatic Machine Learning (AutoML) models?
Decision
Language
Speech
Vision
Web search
The group of services in the Decision group helps you make smarter
decisions:
Multi-service resource
Single-service resource
After clicking on the Create button, the Create Cognitive Services pane
opens, as shown in Figure 3-2.
FIGURE 3-2 Creating a Cognitive Services resource
You will need to select the subscription, resource group, and region
where the resource is to be deployed. You will then need to create a unique
name for the service. This name will be the domain name for your endpoint
and so must be unique worldwide. You should then select your pricing tier.
There is only one pricing tier for the multi-service resource, Standard S0.
Clicking on Review + create will validate the options. You then click on
Create to create the resource. The resource will be deployed in a few
seconds.
You can create a Cognitive Services resource using the CLI as follows:
Click here to view code image
az cognitiveservices account create --name <unique name> --
resource-group <resource
group name> --kind CognitiveServices --sku S0 --location
<region> --yes
Once your resource has been created, you will need to obtain the REST
API URL and the key to access the resource.
EXAM TIP
Practice creating single- and multi-service resources in the Azure
portal and make sure you know where the endpoint and keys can be
found.
Categorize images
Determine the image width and height
Detect common objects including people
Analyze faces
Detect adult content
Describe an image
Categorize an image
Tag an image
Figure 3-5 shows an example of object detection. Three cats have been
identified as objects and their coordinates indicated by the boxes drawn on
the image.
FIGURE 3-5 Example of object detection
Using OCR, you can extract details from invoices that have been sent
electronically or scanned from paper. These details can then be validated
against the expected details in your finance system.
Figure 3-6 shows an example of using OCR to extract text from an
image.
FIGURE 3-6 Example of image classification
The OCR service extracted the following pieces of text from the image:
220-240V ~AC
hp
LaserJet Pro M102w
Europe - Multilingual localization
Serial No.
VNF 4C29992
Product No.
G3Q35A
Option B19
Regulatory Model Number
SHNGC-1500-01
Made in Vietnam
Detect faces
Analyze facial features
Recognize faces
Identify famous people
The facial detection identified the face, drew a box around the face, and
supplied details such as wearing glasses, neutral emotion, not smiling, and
other facial characteristics.
Customer engagement in retail is an example of using facial recognition
to identify customers when they walk into a retail store.
Validating identity for access to business premises is an example of
facial detection and recognition. Facial detection and recognition can
identify a person in an image, and this can be used to permit access to a
secure location.
Recognition of famous people is a feature of domain-specific content
where thousands of well-known peoples’ images have been added to the
computer vision model. Images can be tagged with the names of celebrities.
Face detection can be used to monitor a driver’s face. The angle, or head
poise, can be determined, and this can be used to tell if the driver is looking
at the road ahead, looking down at a mobile device, or showing signs of
tiredness.
Now that you have learned about the concepts of computer vision, let’s
look at the specific Computer Vision services provided by Azure Cognitive
Services.
Skill 3.2: Identify Azure tools and services for
computer vision tasks
Azure Cognitive Services provide pre-trained computer vision models that
cover most of the capabilities required for analyzing images and videos.
This section describes the capabilities of the computer vision services in
Azure Cognitive Services.
A focus of the Microsoft Azure AI Fundamentals certification is on the
capabilities of the Computer Vision service. This requires you to
understand how to use the Computer Vision service and especially how to
create your own custom models with the Custom Vision service.
EXAM TIP
You will need to be able to distinguish between the Computer
Vision, Custom Vision, and Face services.
Analyze image
The analyze operation extracts visual features from the image content.
The image can either be uploaded or, more commonly, you specify a
URL to where the image is stored.
You specify the features that you want to extract. If you do not specify
any features, the image categories are returned.
The request URL is formulated as follows:
Click here to view code image
https://{endpoint}/vision/v3.1/analyze[?visualFeatures]
[&details][&language]
The URL for the image is contained in the body of the request.
The visual features that you can request are the following:
"color": { "dominantColorForeground":
"Black", "dominantColorBackground": "Grey",
"dominantColors": ["Black", "Grey", "White"], "accentColor":
"635D4F", "isBWImg":
false },
Describe image
The describe operation generates description(s) of an image using complete
sentences. Content tags are generated from the various objects in the image.
One or more descriptions are generated. The sentences are evaluated,
and confidence scores are generated. A list of captions is returned ordered
from the highest confidence score to the lowest.
The request URL is formulated as follows:
Click here to view code image
https://{endpoint}/vision/v3.1/describe[?maxCandidates]
[&language]
The parameter maxCandidates specifies the number of descriptions to
return. The default is 1. The default language is English.
Following is the JSON returned for the image of the three cats used
earlier in this chapter:
Click here to view code image
"description": {
There are multiple tags related to the content in the image and a single
sentence describing the image with a confidence of 62.8%.
Detect objects
The detect operation detects objects in an image and provides coordinates
for each object detected. The objects are categorized using an 86-category
taxonomy for common objects.
The request URL is formulated as follows:
Click here to view code image
https://{endpoint}/vision/v3.1/detect
Following is the JSON returned for the image of the three cats used
earlier in this chapter:
Click here to view code image
"objects": [{
The detect operation identified three cats with a high level of confidence
and provided the coordinates for each cat.
Content tags
The tag operation generates a list of tags, based on the image and the
objects in the image. Tags are based on objects, people, and animals in the
image, along with the placing of the scene (setting) in the image.
The tags are provided as a simple list with confidence levels.
The request URL is formulated as follows:
Click here to view code image
https://{endpoint}/vision/v3.1/tag[?language]
Following is the JSON returned for the image of the three cats used
earlier in this chapter:
Click here to view code image
Click here to view code image
"tags": [{
The tag operation generated a list of tags in order of confidence. The cat
tag has the highest confidence score of 99.9%, with domestic cat the lowest
score of 28.7%.
Domain-specific content
There are two models in Computer Vision that have been trained on specific
sets of images:
Thumbnail generation
The Get thumbnail operation generates a thumbnail image by analyzing the
image, identifying the area of interest, and smart crops the image.
The generated thumbnail will differ depending on the parameters you
specify for height, width, and smart cropping.
The request URL is formulated as follows:
Click here to view code image
https://{endpoint}/vision/v3.1/generateThumbnail[?width]
[&height][&smartCropping]
Read The latest text recognition model that can be used with images
and PDF documents. Read works asynchronously and must be used
with the Get Read Results operation.
OCR An older text recognition model that supports only images and
can only be used synchronously.
The JSON returned includes the pieces of text from the image, as shown
next:
Click here to view code image
Click here to view code image
{ "language": "en", "textAngle": 0.0, "orientation": "Up",
"regions": [{
Content moderation
The analyze operation can identify images that are risky or inappropriate.
The Content Moderator service, although not part of Computer Vision (it is
in the Decision group of APIs), is closely related to it.
Content Moderator is used in social media platforms to moderate
messages and images. Content Moderator can be used in education to filter
content not suitable for minors.
Content Moderator includes the ability to detect and moderate:
Images Scans images for adult or racy content, detects text in images
with OCR, and detects faces.
Text Scans text for offensive or sexual content, profanity (in more
than 100 languages), and personally identifiable information (PII).
Video Scans videos for adult or racy content.
Custom terms You can supply a set of terms that the Content
Moderator can use to block or allow.
Custom images You can supply a set of custom images that the
Content Moderator can use to block or allow.
Image classification Tags an image using the labels defined for the
model.
Object detection Identifies objects using the tags and provides the
coordinates of objects in an image. Object detection is a type of
classification model.
You will need to use 30 of the images to train your model, so keep three
images for testing your model after you have trained it.
First, you need to create a Custom Vision service. Figure 3-11 shows the
pane in the Azure portal for creating a Custom Vision service.
FIGURE 3-11 Creating a Cognitive Services resource
The domain is used to train the model. You should select the most
relevant domain that matches your scenario. You should use the General
domain if none of the domains are applicable.
Domains for image classification are as follows:
General
Food
Landmarks
Retail
General (compact)
Food (compact)
Landmarks (compact)
Retail (compact)
General [A1]
General (compact) [S1]
General
Logo
Products on Shelves
General (compact)
General (compact) [S1]
General [A1]
Once the project is created, you should create your tags. In this exercise,
you will create three tags:
Apple
Banana
Orange
Next, you should upload your training images. Figure 3-13 shows the
Custom Vision project with the images uploaded and untagged.
You now need to click on each image. Custom Vision will attempt to
identify objects and highlight the object with a box. You can adjust and
resize the box and then tag the objects in the image, as shown in Figure 3-
14.
FIGURE 3-14 Tagging objects
You will repeat tagging the objects for all the training images.
You will need at least 10 images for each tag, but for better
performance, you should have a minimum of 30 images. To train your
model, you should have a variety of images with different lighting,
orientation, sizes, and backgrounds.
Select the Tagged button in the left-hand pane to see your tagged
images.
You are now ready to train your model. Click on the Train button at the
top of the project window. There are two choices:
The model has identified both the apple and the banana and drawn
boxes around the pieces of fruit. The objects are tagged, and the results
have high confidence scores of 95.2% and 73.7%.
To publish your model, click on the Publish button at the top of the
Performance tab shown in Figure 3-16. You will need to name your model
and select a Custom Vision Prediction resource.
Publishing will generate an endpoint URL and key so that that your
applications can use your custom model.
Object detection
Image classification
Content moderation
Optical character recognition (OCR)
Facial recognition
Landmark recognition
Custom Vision uses images and tags that you supply to train a custom
image recognition model. Custom Vision only has two of the capabilities:
Object detection
Image classification
Gender
Age
Emotions
Similarity matching
Identity verification
The Face service can be deployed in the Azure portal by searching for
Face when creating a new resource. You must select your region, resource
group, provide a unique name, and select the pricing tier: Free F0 or
Standard S0.
You can create Face resources using the CLI as follows:
Click here to view code image
az cognitiveservices account create --name <unique name> --
resource-group <resource
group name> --kind Face --sku F0 --location <region>
Detection
The Face service detects the human faces in an image and returns their
boxed coordinates. Face detection extracts face-related attributes, such as
head pose, emotion, hair, and glasses.
The Face service examines 27 facial landmarks, as shown in Figure 3-
17. The location of eyebrows, eyes, pupils, nose, mouth, and lips are the
facial landmarks used by the Face service.
The detection model returns a FaceId for each face it detects. This Id
can then be used by the face recognition operations described in the next
section.
The JSON returned using the detect operation on the image of the author
in Figure 3-7 is shown next:
Click here to view code image
Click here to view code image
{ "faceId": "aa2c934e-c0f9-42cd-8024-33ee14ae05af",
"smile": 0.011,
"gender": "male",
"age": 53.0,
"glasses": "ReadingGlasses",
As you can see, the attributes are mainly correct except for the hair
color. This is expected as the image in Figure 3-7 was a professionally
taken photograph with good exposure and a neutral expression.
Recognition
The Face service can recognize known faces. Recognition can compare two
different faces to determine if they are similar (Similarity matching) or
belong to the same person (Identity verification).
There are four operations available in facial recognition:
Verify Evaluates whether two faces belong to the same person. The
Verify operation takes two detected faces and determines whether the
faces belong to the same person. This operation is used in security
scenarios.
Identify Matches faces to known people in a database. The Identify
operation takes one or more face(s) and returns a list of possible
matches with a confidence score between 0 and 1. This operation is
used for automatic image tagging in photo management software.
Find Similar Extracts faces that look like a person’s face. The Find
Similar operation takes a detected face and returns a subset of faces
that look similar from a list of faces you supply. This operation is used
when searching for a face in a set of images.
Group Divides a set of faces based on similarities. The Group
operation separates a list of faces into smaller groups on the
similarities of the faces.
You should not use the Identify or Group operations to evaluate whether
two faces belong to the same person. You should use the Verify operation
instead.
EXAM TIP
Ensure that you can determine the scenario for each of the four
facial recognition operations.
Computer Vision
Face
Video Analyzer for Media
EXAM TIP
You will need to be able to distinguish between Computer Vision,
Face, and Video Analyzer for Media.
Computer Vision can detect faces in images but can only provide basic
information about the person from the image of the face, such as the
estimated age and gender.
The Face service can detect faces in images and can also provide
information about the characteristics of the face. The Face service can also
perform the following:
Facial analysis
Face identification
Pose detection
The Video Analyzer for Media service can detect faces in video images
but can also perform face identification.
Here are some examples of the differences between these services:
The Face API can detect the angle a head is posed at. Computer Vision
can detect faces but is not able to supply the angle of the head.
Video Analyzer for Media can detect faces but does not return the
attributes the Face API can return.
The Face API service is concerned with the details of faces. The Video
Analyzer for Media service can detect and identify people and brands
but not landmarks.
Custom Vision allows you to specify the labels for an image. The other
services cannot.
Computer Vision can identify landmarks in an image. The other
services cannot.
Date: 2019-06-10
Time: 13:59:00
Subtotal: 1098.99
Tax: 104.4
Total: 1203.39
Line items:
Item Quantity: 1
Item Quantity: 1
OCR
Read
Form Recognizer
Chapter summary
In this chapter, you learned some of the general concepts related to
computer vision. You learned about the types of computer vision, and you
learned about the services in Azure Cognitive Services related to computer
vision. Here are the key concepts from this chapter:
EXAM TIP
You need to be able to map the NLP workloads on the scenarios
presented or identify which type of NLP workload applies to the
requirements described.
Analysis of text
Tokenize Splitting text into words and phrases.
Statistical Analyzing the terms used, including the frequency of the
appearance of individual words.
Frequency As well as the frequency of individual words,
identifying the frequency of phrases.
PosTag (Part of speech tagging) Assigning parts of speech (noun,
verb, or adjective) to each word.
Sentiment analysis Scoring the text for sentiment, as having a
positive or negative feeling.
Language detection Detecting the predominate language used in
the text.
Language modeling
Semantic modeling Identifying the relationships between words.
Named entity recognition (NER) Identifying objects (places,
dates, or quantities) in the text.
Topic detection Combining entities into topics to describe the
important topics present in the text.
Analysis of speech
Conversion of audio into text Analyzing and interpreting speech
and converting into text.
Conversion of text into audio Analyzing text, identifying phrases,
and synthesizing those phrases into spoken audio.
Translation
Automatic translation between languages for both text and speech.
People
Places
Organizations
Dates and times
Date
Duration
Time
Quantities
Age
Temperature
Speech recognition
Speech recognition does not simply recognize words; it must find patterns
in the audio. Speech recognition uses an acoustic language model that
converts the audio into phonemes. A phoneme is the smallest unit of sound
in speech. When we teach children to read, we teach how letters are
represented by sounds. In English, however, there are 26 letters in the
alphabet, but there are 44 phonemes, and similar words can be pronounced
very differently.
The baked good, scone, is an example of the different phonemes.
Scone has two common pronunciations in England: one that rhymes with
“cone” and another that rhymes with “gone.” If you are in Scotland, you
might hear it pronounced as “skoon.”
Once the phonemes have been identified, the acoustic model then maps
phonemes to words, using statistical algorithms that predict the most
probable sequence of words based on the phonemes.
An example of speech recognition can be found in Microsoft
PowerPoint. The Presenter Coach tool monitors your speech and uses
speech recognition to give you a statistical report for a rehearsal of your
presentation. It will tell you if you have used filler words or
euphemisms, and it detects if you are just reading the text from the slide. It
will also provide suggestions to improve your delivery.
Other common NLP scenarios for speech recognition include interactive
voice response in call centers, transcribing telephone calls, and in-home
automation.
Speech synthesis
Synthesized speech does not generate a sound for each word. It converts the
text into phrases, and then using the acoustic model, converts the phrases
into smaller prosodic units and then into phonemes.
A voice is then applied to convert the phonemes into audio speech. The
voice defines the pitch, speaking rate, and intonation for the generated
audio.
You find speech synthesis in personal digital assistants, like Siri and
Cortana, that respond vocally to your requests.
Common NLP scenarios for speech synthesis are broadcasting arrivals
and departures at airports, reading out text messages while you are driving
your car, and screen reading software applications for visually impaired
people.
Describe translation
There are many languages in the world, and the ability to convert text from
one language into another language is a feature of NLP.
Translation is very difficult for humans to do. Disclaimer: My wife is a
translator from Italian to English. Languages do not have simple word-for-
word translations; for instance, there are words in Italian that can be one of
multiple words in English, depending on the context. There are Italian
words that do not have a simple translation, such as Ciofeca, which means a
poor quality and badly prepared drink, such as coffee. Another example is
volume in Italian means the space inside an object, whereas volume in
English generally is a measurement of that space. Volume can also refer to
the level of sound, or a book in a series. The German word, Schadenfreude,
has no direct translation and generally means to derive pleasure from
someone else’s misfortune. In Finnish, the word Kalsarikännit means the
feeling you get when sitting at home getting drunk in your underwear. As
you can see, translation is hard and relies on context and a lot of
knowledge.
Most human translators translate from one language into their native
language; they do not translate the other way around, as they need to make
the text understandable to the native speaker. You have probably all seen
instruction manuals poorly translated where a native speaker has not been
used.
To translate from one language to another requires models for both
languages and to be able to understand the context of how each language is
used. This involves understanding rules of grammar, use of informal
language, and large dictionaries and glossaries.
AI-powered text translation uses large amounts of translated text to train
the translation models. You will find that the translations are better where
there are more examples of the text available in each language, so the
results of translation between English and other languages are often better
than translation between other languages. Translation results tend to be
better on news and marketing documents than on highly technical
documents, as again, there are more examples of the former available to
train the models on.
Translation includes the conversion of both text and audio speech from
one language into another:
Text Text translation translates the text documents from one language
to another language.
Speech Speech translation translates spoken audio from one language
to another language.
Now that we have explained the common NLP workloads, we will look
at the services in Azure Cognitive Services for the major NLP workloads.
The speech services allow you to add speech processing into your apps:
Speech to Text Transcribes audio into text in real time or from audio
files.
Text to Speech Synthesizes text into spoken audio.
Speech Translation Converts audio into text and translates into
another language in real time. Speech Translation leverages the
Translator service.
Speaker Recognition Identifies people from the voices in an audio
clip.
The Microsoft Azure AI Fundamentals certification encompasses the
capabilities of Natural Language Processing services. This requires you to
understand the capabilities of the services described in this section.
EXAM TIP
You need to be able to map the service to the scenario presented and
to identify the operation for a service to use.
Detect language
Text Analytics supports a wide range of languages. A key operation is the
identification of the language used in a document. Text Analytics can detect
a wide range of languages, variants, dialects, and some regional/cultural
languages—for example, French (fr-Fr) vs. Canadian French (fr-CA).
The Detect language operation returns the name of the language, the
ISO language code, and a level of confidence between 0 and 1. If there is
more than one language in the document, then the predominant language is
returned.
The request URL is formulated as follows:
Click here to view code image
https://{endpoint}/text/analytics/v3.0/languages
This operation found that the language was English, an ISO code “en,”
with a confidence score of 100%.
Sentiment analysis
Sentiment analysis analyzes the emotion for each sentence in a piece of text
and for the whole document.
Sentiment is a classification model that evaluates the emotion of the text
as to how positive or negative it is. The operation returns a sentiment score
between 0 and 1, with 1 as the most positive, and a sentiment label
(positive, negative, neutral, or mixed).
Sentiment analysis uses a model that has been pre-trained on millions of
examples of text. Currently, sentiment can be evaluated for 13 languages.
The request URL is formulated as follows:
Click here to view code image
https://{endpoint}/text/analytics/v3.0/sentiment
"sentences": [
The key phrase extraction operation found five key phrases in the
paragraph of text.
Person The names of people. This is not limited to famous people but
can identify forenames and surnames in the text.
PersonType The job type or job title.
DateTime Dates and times of day including durations and date
ranges.
Quantity Numerical measurements and units including temperature,
percentages, ages, and dimensions.
Location A geographical feature, landmark, building, or city.
Organization The names of companies, political groups, musical
bands, sports teams, government bodies, and public organizations.
Event The names of historical, social, and other events.
Product Physical objects. Currently these are computing related.
Skill Capabilities, skills, or expertise.
Address Addresses including street, city, and postal code.
Phone number Telephone numbers.
Email Email addresses.
URL URLs to websites.
IP Network IP addresses.
The operation returns the entity, its category, and a confidence score
between 0 and 1. Currently, entities can be extracted for 23 languages,
including Arabic.
Two Wikipedia articles have been identified: one for Translation and the
other related to Cognitive Services.
Key phrase extraction can extract the main talking points in the
conversation. This can be used to categorize the call.
Named entity recognition can extract entities such as people’s names,
company names, locations, dates, and personal information. This data
can enhance the data held for the customer.
Another use case for Text Analytics is in handling compliance. You can
scan the emails and call recordings made by your sales team to automate
compliance checking by scanning for mentions of key phrases or named
entities that represent your products and services.
Key concepts
There are three key concepts in LUIS that you need to understand before
creating LUIS applications, as follows:
EXAM TIP
Understanding the difference between these concepts is a
fundamental skill in this exam.
You first define your intents. Intents are linked to the actions that your
client application can perform. You should create an intent when you want
to trigger an action in your client application. You then add a few potential
utterances to each intent. LUIS then takes these examples of phrases with
the intents and starts training the model. These training utterances are used
by your LUIS model to determine which intent the user is referring to.
Let’s look at an example for intents and utterances. We will use a LUIS
model to power a bot that handles requests around creating and using
Cognitive Services. Consider this list of intents and associated utterances:
CreateCognitiveService
“I must deploy Cognitive Services”
“I want to create a Cognitive Service resource”
“I need to generate a new LUIS authoring resource”
“Create Azure Cognitive Services”
CallCognitiveService
“I want to evaluate the sentiment for this sentence”
“I need to determine which language this text”
“I have to translate this document”
“What is the best service for extracting text from an image”
LUIS does not have deep sematic Natural Language Processing. For
instance, LUIS cannot automatically differentiate different verb tenses or
alternative words. As an example, LUIS is unable to determine if the words
add, adding, and added have the same intent. LUIS is unable to
automatically recognize that the words add, create, and generate have the
same intent. Therefore, you need to provide a set of utterances to help LUIS
handle the different ways a user might phrase their requests. LUIS has been
developed so that you only need to provide a few sample utterances for
LUIS to create a model with good accuracy and do not have to provide
every possible variation yourself.
When creating utterances, you need to provide different ways of saying
the same thing with different verb tenses and substitute wording. Microsoft
recommends that you should create between 10 and 30 utterances per
intent. The more example utterances you provide, the more accurate your
model will be.
You also need intents and utterances for greetings and other non-action
phrases that a user might employ.
Entities are the information required to perform the action behind the
intent; they are the data for the action. For the previous example, the
entities might be:
Cognitive Services
LUIS authoring resource
Resource
Document
Sentence
Text
Paragraph
Image
Once you define your utterance and entities, you can improve the
accuracy of the language model by adding hints, known as features, by
providing variations for the words used. LUIS will then use these features
when recognizing the intent and entities.
There are prebuilt entities you can use in your model, and you can
specify your own custom entities. There are four types of entity you can
create:
Prebuilt models
LUIS contains several prebuilt models. These models provide combinations
of intents, utterances, and entities. You can use a prebuilt domain model that
contains intents, utterances, and entities, or you can use a prebuilt intent
model that contains intents and utterances, or you can use a prebuilt entity
model.
The following prebuilt domain models are available in LUIS:
Age
Currency
DateTime
Dimension
Email
Geography
KeyPhrase
Number
Ordinal
Percentage
PersonName
Phonenumber
Temperature
URL
Custom schema
A custom schema consists of intents and, optionally, entities. A new custom
schema has no intents or models. You can add any prebuilt domain, intents,
and entities to a custom model. You are not just restricted to a single
prebuilt model but can add as many as required.
You can, of course, create your own intents, utterances, and entities and
combine them with the prebuilt models to create the schema for your LUIS
app.
LUIS app
To use LUIS, you will need to create a LUIS app that describes the model
for your domain. LUIS requires both an authoring resource and a prediction
resource. The authoring resource is used to create, manage, train, test, and
publish your applications. The prediction resource is used by other
applications after you publish your LUIS application to understand text
inputs.
LUIS uses a web portal (https://www.luis.ai) where you can create your
model, add example utterances, train the model, and finally deploy the app.
Understanding machine learning is not necessary to use LUIS. Instead,
you define the intents and entities and then provide example utterances to
LUIS and relate how those utterances are related to intents and entities.
LUIS uses this information to train the model. You can improve the model
interactively by identifying and correcting prediction errors.
The process for creating a LUIS app is as follows:
Build a LUIS schema Define the domain and add intents and
entities.
Add utterances Add training example phrases for each intent.
Label entities Tag the entities in each utterance.
Add features Create phrase lists for words with similar meanings.
Train Train the model in the app.
Publish Publish the app to an endpoint using the prediction resource.
Test Test your LUIS app using the published endpoint.
Before you can create a LUIS app, you need to create your LUIS
resources. You need both an authoring resource and a prediction resource.
The rest of this section will walk you through creating LUIS resources and
creating a LUIS app.
To create a resource for LUIS resource in the Azure portal, search for
Language Understanding and pick the service titled just Language
Understanding, as shown in Figure 4-4.
FIGURE 4-4 Language services in Azure Marketplace
Clicking on Create will show the description for the service, as shown in
Figure 4-5.
Once your resources have been created, you will need to obtain the
REST API URL and the key to access the resources. To view the endpoint
and keys in the Azure portal, navigate to the resource and click on Keys
and Endpoint, as shown in Figure 4-7.
FIGURE 4-7 Keys and Endpoint
Once you have created your LUIS resources, you can open the LUIS
web portal (https://www.luis.ai) in a browser. You will be prompted to
select your subscription and authoring resource, as shown in Figure 4-8.
FIGURE 4-8 Selecting the LUIS authoring resource
You then need to click on + New app in the LUIS portal to create a new
app. You need to enter a name for the app, select the culture, select the
language the app will understand, and choose your prediction resource. If a
window appears titled, “How to create an effective LUIS app,” you can
close this window.
Utilities
Web
Click on Intents in the left-hand navigation pane. You will now see that
a number of intents from these domains have been added to your app.
You can now add your own intents. Click + Create, create the following
intents, and add the example user inputs (utterances).
CreateCognitiveService
CallCognitiveService
For each intent, add example user inputs, with utterances such as the
following:
Click on Entities in the left-hand navigation pane. You will see the
prebuilt entities already added from the prebuilt domains you selected, as
shown in Figure 4-10.
Click on the Add prebuilt entity and add the following entities:
keyPhrase
personName
Click on the Add prebuilt domain entity and add the Places.product
entity.
You can now add your own entities. Click + Create, name your entity
AzureServices, and choose the List type. Enter the following values and
synonyms:
Cognitive Services
Computer Vision
Custom Vision
Face
OCR
LUIS
Language Understanding
Text Analytics
Sentiment Analysis
Language Detection
Translator
You now need to tag the entities in each of the utterances. Edit the
CreateCognitiveService intent. Use the mouse to select the name of an
Azure service in an utterance. As you select a word or phrase, a window
will appear where you can select an entity. Choose the AzureServices
entity, as shown in Figure 4-11.
FIGURE 4-11 Tag an utterance with an entity
You are now ready to train your model. Click on the Train button at the
top of the LUIS authoring page. Training will take a few minutes.
The LUIS portal allows you to test your app interactively. Click on the
Test button at the top of the LUIS authoring page. A pane will appear
where you can enter a test utterance. First try one of the example utterances
you add to an intent—for example, “I have to translate this document.” You
can see the results as shown in Figure 4-13.
You should see in the results the selected intent (action) and entity
(data). This is the information that your client application can use to
perform the action on the data.
Try other phrases with different wording and evaluate the model. If
LUIS does not correctly predict the intent, you can fix this by assigning to
the correct intent.
When you have completed the testing of your LUIS app, you can
publish the app. After publishing a prediction endpoint URL will be
available.
To use a Language Understanding (LUIS) model to find the intent of a
text statement in a client application, you need the ID of the LUIS app and
the endpoint and key for the prediction resource, not the authoring resource.
Speech to Text
The Speech to Text API detects and transcribes spoken input into text. The
Speech to Text operation can transcribe audio into text in real-time or from
a recording. It converts fragments of sound into text using the acoustic
model and then uses the language model to create words and phrases.
Speech to Text converts audio from a range of sources, including
microphones, audio files, and Azure Blob storage.
The Speech to Text API can be used synchronously (real-time) or
asynchronously (batch). There are two separate APIs: one for short audio
(up to 60 seconds) that you can transcribe in real-time, and the other for
batch transcription. The batch Speech to Text API can translate large
volumes of speech audio recordings stored in Azure Blob Storage.
More than 85 languages and variants are supported.
You can see how Speech to Text works without an Azure subscription at
https://azure.microsoft.com/services/cognitive-services/speech-to-text, as
shown in Figure 4-14.
In this example, the first paragraph from this section was read aloud
using the computer’s microphone. The speech was recognized and
transcribed reasonably accurately but has a couple of errors. It could not
differentiate between 15 and 50, and mis-transcribed Cognitive as
“congresses.” The Speech to Text service is typically faster and more
accurate than a human being can achieve.
Speech to Text partitions the audio based on the speakers’ voices to
determine who said what. This allows you to obtain transcripts with
automatic formatting and punctuation.
Text to Speech
Text to Speech is useful when you cannot look at a screen if you are
controlling other equipment or are using a mobile device. Text to Speech
generates, or synthesizes, text into spoken audio.
The Text to Speech API converts text into synthesized speech. You can
choose from neutral voices, standard voices, or a custom voice. You can
also create your own custom voice for use in speech synthesis. There are
over 200 voices available, and the Text to Speech service supports over 60
languages and variants.
You can see how Text to Speech works without an Azure subscription at
https://azure.microsoft.com/services/cognitive-services/text-to-speech, as
shown in Figure 4-15.
In this example, the first paragraph from this section is synthesized into
audio and played through the computer’s speakers. The results are very
impressive and very lifelike.
Speech Translation
You could achieve translation of speech yourself using a mixture of the
Speech to Text, Translator, and Text to Speech services. The Speech
Translation service simplifies this process for you. It first detects and
transcribes speech into text; then it processes to make it easier for
translation. The text is then fed to the text translation service to convert to
the target language. Finally, the translated text is synthesized into audio.
Speech Translation converts audio into text and translates into another
language in real-time or in batch. Speech Translation can translate audio
into more than 60 languages. Speech Translation performs both speech-to-
text and speech-to-speech translations.
You can see how Speech Translation works without an Azure
subscription at https://azure.microsoft.com/services/cognitive-
services/speech-translation, as shown in Figure 4-16.
In this example, the first paragraph from this section was read aloud
using the computer’s microphone. The speech was recognized and
translated into the target language. However, it has not translated it
correctly; the first half of the text has been split incorrectly into sentences
and has several errors. The second half is reasonably well translated.
Speaker Recognition
Speaker Recognition identifies speakers from their voice characteristics in
an audio clip. Speaker Recognition answers the question: “Who is
speaking?”
To be identified, speakers must be enrolled using sample audio
recordings of their voice. The Speaker Recognition service extracts the
characteristics of the voice to form a profile. This profile is used to verify
that the speaker is the same person (speaker verification) or to identify the
speakers in a conversation (speaker diarization). Speaker diarization can be
used to assist in creating transcripts of conversations.
Chapter summary
In this chapter, you learned some of the general concepts related to Natural
Language Processing. You learned about the features of Natural Language
Processing, and you learned about the services in Azure Cognitive Services
related to language and speech. Here are the key concepts from this chapter:
Thought experiment
Let’s apply what you have learned in this chapter. In this thought
experiment, demonstrate your skills and knowledge of the topics covered in
this chapter. You can find the answers in the section that follows.
You work for Litware, Inc., a company with several brands that supplies
business to business services across the world. Litware is interested in
analyzing the large amount of text involved in their business using AI.
Litware wants to evaluate how Cognitive Services can improve their
internal document categorization.
Litware wants to create a single support desk to handle their worldwide
customer base. This central desk will provide consistent responses to
customers no matter their location or language.
Litware needs to understand how customers will respond to this move to
a single support desk. Customers are sent a questionnaire to ask them about
this move. The questionnaire has a series of questions and includes a space
for the customer to write their thoughts on this move. Customers can also
make a request as part of the questionnaire for more information or for
someone to contact them.
As part of this planned move, Litware monitors social media for
mentions about these proposed changes and records telephone calls into the
existing support desks.
Answer the following questions:
1. Which workload is used to evaluate how the customer feels about the
move to a central support desk?
2. Which workload is used to discover the topics mentioned by customers
in the questionnaire?
3. Named entity recognition extracts the intent and action from the
request in the questionnaire. Is this correct?
4. Which workload is used to monitor social media for negative mentions
of Litware’s brands?
5. Which workload is used to transcribe telephone calls into the support
desk?
6. Which Cognitive Service would you use to mine customer perceptions
of Litware’s products and services?
7. What examples of how users phrase their requests do you need to
provide to the LUIS app?
8. For what information do you need to use a published Language
Understanding model to find the meaning in a text statement?
9. Which service do you use to translate the large volumes of telephone
calls?
10. Do you have to use a standard voice, or can you create a custom voice
for text to speech?
This bot is an FAQ bot for a professional soccer team in the United
Kingdom, AFC Bournemouth, whose nickname is the Cherries. The bot is
named CherryBot. CherryBot answers common questions about the club
and attending games.
Another common use case for webchat bots is in the online ordering
process or for travel reservation and booking. For example, the Seattle
Ballooning website https://seattleballooning.com/experience/ allows you to
book your ballooning experience via a chatbot. Figure 5-4 shows a
screenshot of this bot.
FIGURE 5-4 Online ordering bot example
This bot asks a series of questions that allows customers to tailor their
experience and to choose what they want to book.
You will find webchat bots on ecommerce sites. There are even bots that
help you decide which clothes to purchase.
A more recent use case for webchat bots has been in healthcare to triage
people based on their symptoms. For example, the CDC has a Coronavirus
Self-Checker bot on their website: https://www.cdc.gov/coronavirus/2019-
ncov/symptoms-testing/coronavirus-self-checker.html. This bot was built
using the Azure services described in this chapter.
Cortana responds both with text and speech utilizing the Speech services
discussed in the previous chapter. Cortana can:
Manage your calendar
Join a meeting in Microsoft Teams
Set reminders
Find information on the web
Open apps on your computer
Windows 10 includes the Microsoft Virtual Assistant that you can start
by pressing the Start key and typing Get Help. Figure 5-6 shows a
screenshot of the Microsoft Virtual Assistant app.
FIGURE 5-6 Microsoft Virtual Assistant app
NOTE RESPONSIBLE AI
The Transparency principle for Responsible AI was discussed in
Chapter 1 and should be considered when building bots and using
conversational AI.
A user should always know they are interacting with a bot. When a bot
conversation is started, the bot should clearly state that it is a bot. The bot
should state its purpose and limitation—for instance, by listing the scope of
what the bot can answer or do. A bot should enable the user to escalate or
transfer to a human.
Bots work well when they are limited solely to their purpose and do not
try to be too generic.
Clicking on Create will show the description for the service, as shown in
Figure 5-8.
FIGURE 5-8 Service description for the QnA Maker service
After clicking on the Create button, the Create QnA Maker pane opens,
as shown in Figure 5-9.
FIGURE 5-9 Creating QnA Maker resources
You will need to select your subscription and resource group. You will
then need to create a unique name for the service. This name will be the
domain name for your endpoints and so must be unique worldwide. There
are four resources to create: QnA Maker, Azure Search, App Service, and
App Insights. For each of these, you need to select the region and pricing
tier. Free tiers are available.
Clicking on Review + Create will validate the options. You then click on
Create to create the resources.
You can create the QnA Maker resources using the CLI as follows:
Click here to view code image
az cognitiveservices account create --name <unique name for
authoring> --resource-group <resource
group name> --kind LUIS.Authoring --sku F0 --location <region>
--api-
properties qnaRuntimeEndpoint=<URL Endpoint for App Service>
Once your resources have been created, you will need to obtain the
REST API URL and the key to access the resources. To view the endpoint
and keys in the Azure portal, navigate to the resource and click on Keys
and Endpoint, as shown in Figure 5-10.
FIGURE 5-10 Keys and Endpoint
QnA Maker portal
QnA Maker uses a web portal (https://www.qnamaker.ai) where you can
create your knowledge base, add key value pairs, train the model, and
finally publish the knowledge base.
The process for creating a knowledge base is as follows:
1. Name the knowledge base.
2. Populate the knowledge base from files and website URLs.
3. Create the knowledge base.
4. Train the knowledge base.
5. Test the knowledge base.
6. Publish the knowledge base to an endpoint.
Once you have created your QnA Maker resources, you can open the
QnA Maker web portal (https://www.qnamaker.ai) in a browser. After
clicking Create a knowledge base, you will be presented with five steps.
The first step is to create a QnA Maker resource. This will open the Azure
portal, as shown in Figure 5-9. Because you have already created a QnA
Maker resource, you can skip this step and continue to Step 2 to select your
subscription and your QnA Maker resource, as shown in Figure 5-11.
FIGURE 5-11 Selecting the QnA Maker resource
The final step is to create your knowledge base. QnA Maker will
analyze the sources you added to extract question-and-answer pairs. This
will take a few minutes, and when complete, you can view the knowledge
base, as shown in Figure 5-13.
FIGURE 5-13 The QnA Maker knowledge base
Once you have completed editing the knowledge base, you can save and
train the knowledge base. This should only take a few minutes.
You can test the knowledge base directly in the portal shown in Figure
5-14.
Clicking on the Create Bot button opens the Azure portal, as shown in
Figure 5-16.
FIGURE 5-16 Create a QnA Maker bot
You need to create a unique name for the bot. This name will be the
domain name for the web app used by the bot and so must be unique
worldwide. You will need to select your subscription, resource group,
region, and pricing tier. A Free tier is available. After you click on Create,
the code for your bot will be generated, and you can select from C# or
Node.js. This code can be downloaded and customized. Clicking on Create
will automatically generate and deploy the bot using the Azure Bot Service.
You can test the bot in the Azure Portal from within the bot’s resource
page. The bot shown at the beginning of the chapter in Figure 5-2 is a bot
generated from a QnA Maker knowledge base.
Now that you have created a bot, let’s have a look in more detail at the
Azure Bot Service.
Composer
The Bot Framework Composer is a tool to build bots. Bot Composer
supports both LUIS and QnA Maker.
The Bot Framework Composer uses a visual user interface to create the
conversational flow and generate responses. Composer is a recent addition
to Azure Bot Services and is the subject of ongoing development to add
further features. Microsoft intends for Composer to be the primary tool for
developing bots.
The Bot Framework Composer is open source and is multi-platform
with support for Windows, Linux, and MacOS. You can find Bot Composer
on GitHub at https://github.com/microsoft/BotFramework-Composer.
Channels
The Azure Bot Framework separates the logic of the bot from the
communication with different services. When you create a bot, the bot is
only available for use embedded on websites with the Web Chat channel.
You can add channels to your bot to make the bot available on other
platforms and services.
One of the major benefits of the Azure Bot Service is that you develop
your bot once and connect to multiple channels without needing to change
the code for each channel to handle the specific requirements and formats
of that channel. The Azure Bot Service takes care of those requirements
and converting the formats.
The following channels are available for connection to bots:
Alexa
Direct Line
Direct Line Speech
Email
Facebook
GroupMe
Kik
Line
Microsoft Teams
Skype
Slack
Telegram
Telephone
Twilio (SMS)
Web Chat
Bot Lifecycle
The process for developing and deploying a bot is as follows:
Plan Decide on the goal for your bot, and decide if your bot requires
LUIS, Speech, or QnA Maker support. This is also a good time to
reflect on the principles of Responsible AI and how they will be
applied to your bot.
Build Coding of the bot.
Test Testing of bots is very important to make sure the bot is
behaving as expected.
Publish Once testing is complete, the developer can publish the bot
to Azure.
Connect to Channels Connect the bot to the channels where you
want your bot to be used from.
Evaluate Bots typically are never finished. There are always changes
to the business domain that the bot is servicing, and that might mean
the bot is not as effective as it once was. You need to monitor how the
bot is performing.
We will now walk you through creating a bot using the Azure Bot
Service. First, you need an Azure Bot Service resource.
To create an Azure Bot resource in the Azure portal, search for Bot and
pick the service titled Web App Bot, as shown in Figure 5-18.
FIGURE 5-18 Web App Bot in Azure Marketplace
Clicking on Create will show the description for the service, as shown in
Figure 5-19.
FIGURE 5-19 Service description for the Web App Bot service
After clicking on the Create button, the Create Web App Bot pane
opens, as shown in Figure 5-20.
FIGURE 5-20 Creating Web App Bot resources
You need to create a unique name for the bot. This name will be the
domain name for the web app used by the bot and so must be unique
worldwide. You will need to select your subscription, resource group,
region, and pricing tier. A Free tier is available. There are four resources to
create: Bot, Web App, App Service Plan, and App Insights.
You can select the template from which you create your bot. You can
select from C# or Node.js templates. If you choose the Basic bot that
includes LUIS, you will need to specify the LUIS app to use. Clicking on
Create will automatically generate and deploy the bot using the Azure Bot
Service.
After creating your bot, you can see the resources created in your
resource group, as shown in Figure 5-21.
You can create a Web App Bot resource using CLI. You must first
register an app in Azure AD and then use the CLI command, as follows:
Click here to view code image
az bot create --name <bot handle> --resource-group <resource
group name> --kind webapp
–appid <Azure AD App ID>
Once your resources have been created, you should open the Web App
Bot resource and navigate to the overview pane, as shown in Figure 5-22.
FIGURE 5-22 Bot overview
Clicking on Download bot source code generates a ZIP file containing
the source code for your bot. You can edit this source code with Visual
Studio Code to add functionality and other processing, and then publish
your changes back to the Bot Service.
Clicking the Test in Web Chat opens the test pane for the bot, as shown
in Figure 5-23.
FIGURE 5-23 Test in Web Chat
You can test out your bot in this pane prior to deploying your bot.
Clicking on Connect to channels shows the currently enabled channels
and allows you to deploy your bot to other channels, as shown in Figure 5-
24.
FIGURE 5-24 Bot channels
Chapter summary
In this chapter, you learned some of the general concepts related to
conversational AI. You learned about the features of conversational AI, and
you learned about the services in Azure related to building bots. Here are
the key concepts from this chapter:
Thought experiment
Let’s apply what you have learned in this chapter. In this thought
experiment, demonstrate your skills and knowledge of the topics covered in
this chapter. You can find the answers in the section that follows.
You work for Tailspin Toys, a manufacturer of toys crafted from wood.
Tailspin Toys are concerned with customer experience and want to
implement an Omni-Channel customer support with an automated first-line
solution.
Tailspin Toys plans to launch a new line of products that teaches
children the basics of logic using moveable wooden tags to represent logic
gates and wooden gears to demonstrate how data is processed. These
educational toys will contain electrical contacts and sensors that will
capture the positions of the wooden components. As part of this launch, an
app for mobile phones is planned that can take pictures of the arrangement
of the components and confirm if they are correct, showing videos if
wrong, and reading out instructions to the customer. The app needs to use
speech input. Users can request the bot to upload their pictures to a
community board where customers can share their progress and
experiences.
The existing support system relies on customer support agents accessing
a variety of documents, databases, and applications, including the
following:
PDF files
Excel spreadsheets
An SQL database
Various SharePoint lists
Word documents held on SharePoint
A
Accountability principle of Responsible AI, 12
Adaptive Cards, 157
agents, 9. See also bots; chatbots; webchat bots
AI (artificial intelligence), 1, 115. See also ML (machine learning); NLP
(natural language processing); Responsible AI
anomaly detection, 6–7
algorithms, 7
applications, 7
computer vision, 7–8
applications, 7
conversational, 9, 149
bots, 155
IVR (Interactive Voice Response), 153
personal digital assistants, 153
Responsible AI and, 155–156
services, 156–157
use cases, 156
webchat bots, 152–153
datasets, splitting the data, 28–30
knowledge mining, 8–9
NLP (natural language processing), 8
normalization, 27–28
prediction, 6
Responsible, 9–10
training, 28
AI for Good, 13
algorithms
anomaly detection, 7
hyperparameters, 54
analyzing, images, 90–91
Content Moderator service and, 96
anomaly detection, 3
algorithms, 7
applications, 7
applications
for anomaly detection, 7
for computer vision, 7
for conversational AI, 9
of NLP, 8
apps
LUIS, 132–140
Seeing AI, 4
Automated Machine Learning (AutoML), 40, 58–63
Azure, 2. See also Bot Service; Cognitive Services; services
Azure Bot Framework, channels, 167–168. See also Bot Service
Azure Machine Learning, 3
Automated Machine Learning (AutoML), 58–63
designer, 63–66
creating a new model, 66–70
deploying a model, 70–72
frameworks and, 33
inferencing, 58
no-code machine learning, 58
scoring, 54–56
subscription, 34
training, 54
workflow, accessing, 36–37
workspace, 38
compute targets, 37
creating, 34–35
datastores, 42–43
importing data, 43–48
preparing data for training, 49–50
resources, 36
Azure Machine Learning studio, 39–40
compute instances, 40–42
Azure Marketplace, 34
Azure Seeing AI app, 4
B
bias, 27, 56
Fairness principle of Responsible AI, 10
binary prediction, 6
Bing Web Search, 80
binning, 51
Bot Service, 5
Bot Framework Composer, 167
Bot Framework Emulator, 167
Bot Framework SDK, 165
capabilities, 165
creating a bot, 168–173
templates, 165–166
bots, 9, 149, 151, 155. See also Bot Service
Adaptive Cards, 157
channels, 172–173
creating, 168–173
creating in QnA Maker, 163–164
integrating with LUIS and QnA Maker, 166–167
lifecycle, 168–173
resources, 171
Responsible AI, 12–13
templates, 165–166, 170
use cases, 152–153
C
categories, 85. See also tags
celebrity model, 93–94
channels
Azure Bot Framework, 167–168
bot, 172–173
chatbots, 9, 128
Responsible AI, 12–13
classification, 3, 21–22
evaluation metrics, 31–32, 56–57
image, 85–86
facial detection, recognition and analysis, 88–89
object detection, 86
OCR (optical character recognition), 87–88
clustering, 3, 22–23
evaluation metrics, 33
recommender model, 23
Cognitive Services, 4, 77, 78
authentication keys, 83
capabilities, 4
Computer Vision service, 84, 89–90
analyze operation, 90–91
describe operation, 91
detect operation, 92
domain-specific models, 93–94
Get thumbnail operation, 94
OCR (optical character recognition), 94–96
tags, 92–93
containers, 84
Content Moderator service, 96
Custom Vision service, 96–97
creating a model, 97
Decision group, 79
deploying, 80–82
Face service, 104
facial detection, 104–107
features, 105
parameters, 106
recognition and, 107
Form Recognizer, 108–110
Language group, 79
language services, 122, 123
LUIS (Language Understanding service), 128, 129
app, 132–140
custom schema, 132
entities, 130–131, 137
intents, 129–130, 136–137
NLP and, 130
pre-built entities, 131, 132
pre-built models, 131–132
resources, 133–136
training, 138–139
use cases, 140
utterances, 130
Speech service, 79, 122, 140
capabilities, 140–141
Speaker Recognition, 143
Speech to Text, 141–142
Speech Translation, 142–143
Text to Speech, 142
use cases, 143
Text Analytics, 123–124
key phrase extraction, 125
language detection, 124
NER (named entity recognition), 126–127
sentiment analysis, 124–125
Translator service, 144
operations, 144–145
resources, 144
use cases, 145
Vision group, 80
compute clusters, 52–54
Computer Vision service, 7–8, 77, 84, 86, 89–90. See also Custom Vision
service
analyze operation, 90–91
applications, 7
bias and, 27
Content Moderator service and, 96
Custom Vision service and, 103–104
describe operation, 91
detect operation, 92
domain-specific models, 93–94
Face service and, 107–108
facial detection, recognition and analysis, 88–89
facial recognition capabilities, 107–108
Get thumbnail operation, 94
image classification, 85–86
key features, 84
object detection, 86
OCR (optical character recognition), 87–88, 94–96
tags, 92–93
use cases, 84–85
visual features, 90
containers, Cognitive Services, 84
Content Moderator service, 96
conversational AI, 9, 149
Azure services for, 156–157
bots, 9, 155
Adaptive Cards, 157
channels, 172–173
creating in QnA Maker, 163–164
integrating with LUIS and QnA Maker, 166–167
lifecycle, 168–173
resources, 171
use cases, 152–153
IVR (Interactive Voice Response), 153
personal digital assistants, 153
Cortana, 154
Microsoft Virtual Assistant, 154–155
Responsible AI and, 155–156
use cases, 149–150, 156
Cortana, 154
creating
bots, 168–173
datasets, 25
cross validation, 57
custom schema, 132
Custom Vision service, 96–97
Computer Vision service and, 103–104
creating a model, 97
creating an object detection model, 97–103
publishing your model, 103
training, 102–103
resources, 99
D
datasets
bias, 56
creating, 25
cross validation, 57
engineering, 26–27
feature(s)
engineering, 26–27, 51
selection, 50–51
importing into workspace, 43–48
normalization, 27–28, 49
partitioning, 49
splitting the data, 28–30
testing, 28
training, 30
Decision services, 79
deploying, Cognitive Services, 80–82
resources, 80–83
designer. See Azure Machine Learning, designer
Docker containers, Cognitive Services, 84
domains, 99–100
DSVM (Azure Data Science virtual machine), 34
E
emotion. See sentiment analysis
entities, 119, 126, 130–131, 132
tagging, 138–139
evaluation metrics, 49–50, 56–57
for classification models, 31–32
for clustering models, 33
for regression models, 30–31
F
Face service, 104
Computer Vision service and, 107–108
detection and, 104–107
features, 105
parameters, 106
recognition and, 107
facial recognition, 88–89, 104, 107
detection and, 104–107
features, 105
parameters, 106
Fairness principle of Responsible AI, 10, 27
feature(s), 19
engineering, 26–27, 51
facial recognition and, 105
normalization, 27–28
selecting, 26, 50–51
unsupervised learning and, 19
visual, 90
Form Recognizer, 108–110
OCR and, 110
pre-trained models, 109
frameworks, Azure Machine Learning and, 33
G-H-I
hyperparameters, 54
image(s). See also computer vision
analyzing, 90–91
classification, 85–86
categories, 85
object detection, 86
OCR (optical character recognition), 87–88
quality control, 86
color scheme, 85
Content Moderator service and, 96
descriptions, 91
domains, 99–100
facial detection, recognition and analysis, 88–89
object detection, 92
OCR (optical character recognition), 94–96
processing, 4, 7–8
tags, 92–93
thumbnails, creating, 94
Imagine Cup, 18
importing, datasets, 43–48
Inclusiveness principle of Responsible AI, 11
inferencing, 57–58
ingesting data, 43–48
intents, 129–130, 136–137
IVR (Interactive Voice Response), 153
J-K-L
key phrase extraction, 119, 125
knowledge mining, 8–9
labels, 19. See also feature(s); regression
identifying, 26
landmark model, 93–94
language modeling, 118–119
Language services, 79
QnA Maker, 157
creating a bot, 163–164
populating the knowledge base, 162–163
portal, 160–164
publishing the knowledge base, 163
resources, 158–160
LUIS (Language Understanding service), 128, 129
app, 132–140
bots and, 166–167
custom schema, 132
entities, 130–131, 137
intents, 129–130, 136–137
NLP and, 130
pre-built entities, 131, 132
pre-built models, 131–132
resources, 133–136
training, 138–139
use cases, 140
utterances, 130
M
Microsoft Virtual Assistant, 154–155
ML (machine learning), 1, 2–3, 17–18. See also Azure Machine Learning;
feature(s); labels
anomaly detection, 3, 6–7
bias, 27
classification models, 3, 21–22
evaluation metrics, 31–32
clustering models, 3, 22–23
evaluation metrics, 33
compute instances, 40–42
datasets
creating, 25
feature engineering, 51
feature selection, 26, 50–51
normalization, 27–28
splitting the data, 28–30
testing, 28
training, 28
evaluation metrics, 56–57
inferencing, 57–58
labels, 19
models, 3–4, 18–19
overfitting, 57
regression, 3
evaluation metrics, 30–31
reinforcement learning, 19
supervised learning, 19
features, 19
labels, 19
regression, 19–21
training, 26, 30, 51–52, 54
compute clusters and, 52–54
preparing data for, 49–50
unsupervised learning, 19
workflow
create a dataset, 25
define the problem, 24–25
feature engineering, 26–27
feature selection, 26
identify labels, 26
models
celebrity, 93–94
creating, 66–70
for Custom Vision service, 97
deploying, 70–72
evaluation metrics, 49–50, 56–57
for classification models, 31–32
for clustering models, 33
for regression models, 30–31
landmark, 93–94
language, 118–119, 128
LUIS (Language Understanding service) and, 131–132
object detection, 97–103
scoring, 54–56
training, 26, 28, 30, 51–52, 54
compute clusters and, 52–54
preparing data for, 49–50
multiple outcome prediction, 6
N
NER (named entity recognition), 117, 119–120, 126–127
entities, 126
Neural Machine Translation (NMT), 144
NLP (natural language processing), 4, 8, 115, 116–117
applications, 8
key phrase extraction, 119
language modeling, 118–119
LUIS (Language Understanding service) and, 130
NER (named entity recognition), 119–120
sentiment analysis, 120
speech recognition, 120–121
speech synthesis, 121
text analytics techniques, 117–118
translation, 121–122
use cases, 118
NLU (Natural Language Understanding), 128
no-code machine learning, 58
normalization, 27–28, 49
numerical prediction, 6
O
object detection, 86
creating a model using Custom Vision service, 97–103
domains, 99–100
tagging, 101–102
OCR (optical character recognition), 87–88, 94–96
Form Recognizer and, 108–110
outlier detection, 6. See also anomaly detection
overfitting, 57
P
partitioning, 49
personal digital assistants, 153
Cortana, 154
Microsoft Virtual Assistant, 154–155
phoneme, 120
pipeline, 57–58
portal, QnA Maker, 160–164
prediction, 6
inferencing, 57–58
principles of Responsible AI
Accountability, 12
Fairness, 10
Inclusiveness, 11
Privacy and Security, 11
Reliability and Safety, 11
Transparency, 12, 155
Privacy and Security principle of Responsible AI, 11
Q
QnA Maker, 157. See also Azure Bot Service
bots and, 166–167
creating a bot, 163–164
creating resources, 158–160
populating the knowledge base, 162–163
portal, 160–164
publishing the knowledge base, 163
roles, 163
quality control, 86
R
recommender model, 23
regression, 3, 19–21
evaluation metrics, 30–31, 56
reinforcement learning, 19
Reliability and Safety principle of Responsible AI, 11
resources
bot, 171
Cognitive Services, 80–83
Custom Vision service, 99, 109
LUIS (Language Understanding service), 133–136
QnA Maker, 158–160
Translator service, 144
Responsible AI, 9–10
Accountability principle, 12
for bots, 12–13
for conversational AI, 155–156
Fairness principle, 10, 27
Inclusiveness principle, 11
Privacy and Security principle, 11
Reliability and Safety principle, 11
Transparency principle, 12, 155
roles, 163
S
scoring, 54–56
Seeing AI app, 4
semantic modeling, 117
sentiment analysis, 117, 119, 120, 124–125
services. See also Cognitive Services
Azure Cognitive Services, 4, 78
Azure Machine Learning, 3
Bot Service, 5
for conversational AI, 156–157
Speaker Recognition, 143
speech
recognition, 120–121
synthesis, 121
to text, 141–142
translation, 122
Speech service, 79
capabilities, 140–141
Speaker Recognition, 143
Speech to Text, 141–142
Speech Translation, 142–143
Text to Speech, 142
use cases, 143
subscription, Azure Machine Learning, 34
supervised learning, 19
classification, 21–22
features, 19
labels, 19
regression, 19–21
T
tags, 85, 90, 92–93, 117
entities and, 138–139
templates, bot, 165–166, 170
testing, 28
text
key phrase extraction, 125
language detection, 124
sentiment analysis, 124–125
translation, 122
Text Analytics, 123–124
key phrase extraction, 125
language detection, 124
NER (named entity recognition), 126–127
entities, 126
sentiment analysis, 124–125
use cases, 127–128
Text to Speech, 142
training, 26, 28, 30, 51–52, 54, 138–139. See also datasets; models
compute clusters and, 52–54
evaluation metrics, 56–57
object detection model, 99–100, 102–103
overfitting, 57
preparing data for, 49–50
translation, 121–122, 142–143. See also Speech service; Translator service
Translator service, 144
operations, 144–145
resources, 144
use cases, 145
Transparency principle of Responsible AI, 12, 155
U
unsupervised learning, 19
clustering, 22–23
use cases
for computer vision, 84–85
for conversational AI, 149–150, 156
for NLP, 118
for Speech service, 143
for Text Analytics, 127–128
for Translator service, 145
for webchat bots, 152–153
utterances, 130
V
virtual machines (VMs), compute instances, 40–42
Vision services, 80. See also computer vision
W-X-Y-Z
webchat, 151
webchat bots, 151. See also Azure Bot Service; bots
channels, 172–173
creating, 168–173
integrating with LUIS and QnA Maker, 166–167
lifecycle, 168–173
resources, 171
templates, 165–166, 170
use cases, 152–153