ML Microsoft Course Overview: Machine Learning in Context
ML Microsoft Course Overview: Machine Learning in Context
Course Overview
Training a Model
After that, we will focus on the core machine learning process: training a model. We'll
cover the entire chain of tasks, from data import, transformation, and management to
training, validating, and evaluating the model.
Foundational Concepts
Since this is an introductory course, our goal will be to give you an understanding of
foundational concepts. We'll talk about the fundamentals of supervised (classification
and regression) and unsupervised (clustering) approaches.
Classic Applications of ML
We'll also cover some of the best known specific applications of machine learning,
like recommendations, text classification, anomaly detection, forecasting, and feature
learning.
Managed Services, Cloud Computing, and Microsoft Azure
Many machine learning problems involve substantial requirements—things like model
management, computational resource allocation, and operationalization. Meeting all
these requirements on your own can be difficult and inefficient—which is why it's often
very beneficial to use Software as a Service (SaaS), managed services, and cloud
computing to outsource some of the work. And that's exactly what we'll be doing in this
course—specifically, we'll show you how to leverage Microsoft Azure to empower your
machine learning solutions.
Responsible AI
At the very end of the course, we'll talk about the broader impact of machine learning.
We'll discuss some of the challenges and risks that are involved with machine learning,
and then see how we can use principles of responsible artificial intelligence, such
as transparency and explainability, to help ensure our machine learning applications
generate positive impact, and avoid harming others.
Lesson2:
Lesson Overview
In this lesson, our goal is to give you a high-level introduction to the field of machine
learning, including the broader context in which this branch of computer science exists.
Let's start with a classic definition. If you look up the term in a search engine, you might
find something like this:
Machine learning is a data science technique used to extract patterns from data, allowing
computers to identify related data, and forecast future outcomes, behaviors, and trends.
Let's break that down a little. One important component of machine learning is that we
are taking some data and using it to make predictions or identify important relationships.
But looking for patterns in data is done in traditional data science as well. So how does
machine learning differ? In this next video, we'll go over a few examples to illustrate the
difference between machine learning and traditional programming.
QUESTION 1 OF 5
Traditional Programming
Machine Learning
ENVIAR
QUESTION 2 OF 5
Machine Learning
ENVIAR
QUESTION 3 OF 5
Machine Learning
ENVIAR
QUESTION 4 OF 5
Now imagine that you have some images that contain handwritten
numbers. You want to create a program that will recognize which number
is in each picture, but you're not sure exactly what characteristics can be
used to best tell the numbers apart.
Machine learning
ENVIAR
QUESTION 5 OF 5
SIGUIENTE
Applications of Machine Learning
The applications of machine learning are extremely broad! And the opportunities cut
across industry verticals. Whether the industry is healthcare, finance, manufacturing,
retail, government, or education, there is enormous potential to apply machine
learning to solve problems in more efficient and impactful ways.
We'll take a tour through some of the major areas where machine learning is applied,
mainly just to give you an idea of the scope and type of problems that machine
learning is commonly used to address.
Capture, prioritize, and route service requests to the correct employee, and
improve response times
A busy government organization gets innumerable service requests on an annual basis.
Machine learning tools can help to capture incoming service requests, to route them to
the correct employee in real-time, to refine prioritization, and improve response times.
Can check out this article if you're curious to learn more about ticket routing
Further Reading
What’s the Difference Between Artificial Intelligence, Machine Learning and Deep
Learning? by Michael Copeland at NVIDIA
SIGUIENTE
The data science process typically starts with collecting and preparing the data before
moving on to training, evaluating, and deploying a model. Let's have a look.
QUESTION 1 OF 3
Here are the typical steps of the data science process that we just
discussed. Can you remember the correct order?
Deploy the model and then retrain as necessary
Train the model and evaluate its performance
Collect and prepare the data
STEP
DESCRIPTION
Steps 1 & 2
Steps 3 & 4
Steps 5 & 6
ENVIAR
QUESTION 2 OF 3
Here are some of the steps once again, along with some of the actions
that you would carry out during those steps. Can you match the step with
the appropriate action?
(Again, first try to do it from memory—but have a look at the text or video
above if you get stuck.)
Prepare the data
Evaluate the model
Train the model
Deploy the model
ACTION
WHICH STEP OF THE PROCESS?
Package the model and dependencies
Run the model through a final exam using data from your validation data set
Create features needed for the model
Select the algorithm, and prepare training, testing, and validation data sets
ENVIAR
QUESTION 3 OF 3
In machine learning, often you have to tune parameters for the chosen
learning algorithm to improve the performance on relevant metrics, such
as prediction accuracy. At what stage of the data science lifecycle do you
optimize the parameters?
Training the model
For example, we may want to use gender information in the dataset to predict if an
individual has heart disease. Before we can use this information with a machine
learning algorithm, we need to transfer male vs. female into numbers, for instance, 1
means a person is male and 2 means a person is female, so it can be processed. Note
here that the value 1 or 2 does not carry any meaning.
Another example would be using pictures uploaded by customers to identify if they are
satisfied with the service. Pictures are not initially in numerical form but they will need
to be transformed into RGB values, a set of numerical values ranging from 0 to 255, to
be processed.
QUESTION 1 OF 2
Time-Series
Categorical
Text
ENVIAR
QUESTION 2 OF 2
Have a look at this chart showing the number of people who like each
flavor of ice cream:
What type of data is this?
Numerical
Time-Series
Categorical
Text
ENVIAR
SIGUIENTE
Tabular Data
In machine learning, the most common type of data you'll encounter is tabular data—
that is, data that is arranged in a data table. This is essentially the same format as you
work with when you look at data in a spreadsheet.
Here's an example of tabular data showing some different clothing products and their
properties:
Each column describes a single product (e.g., a shirt), while each row describes a
property the products can have (e.g., the color of the product)
ENVIAR
QUESTION 2 OF 2
Vectors
It is important to know that in machine learning we ultimately always work with
numbers or specifically vectors.
A vector is simply an array of numbers, such as (1, 2, 3)—or a nested array that
contains other arrays of numbers, such as (1, 2, (1, 2, 3)).
Vectors are used heavily in machine learning. If you have taken a basic course in linear
algebra, then you are probably in good shape to begin learning about how they are
used in machine learning. But if linear algebra and vectors are totally new to you, there
are some great free resources available to help you learn. You may want to have a look
at Khan Academy's excellent introduction to the topic here or check out Udacity's
free Linear Algebra Refresher Course.
For now, the main points you need to be aware of are that:
All non-numerical data types (such as images, text, and categories) must
eventually be represented as numbers
In machine learning, the numerical representation will be in the form of an array
of numbers—that is, a vector
As we go through this course, we'll look at some different ways to take non-numerical
data and vectorize it (that is, transform it into vector form).
2.8Scaling Data
Scaling Data
Scaling data means transforming it so that the values fit within some range or scale,
such as 0–100 or 0–1. There are a number of reasons why it is a good idea to scale your
data before feeding it into a machine learning algorithm.
Let's consider an example. Imagine you have an image represented as a set of RGB
values ranging from 0 to 255. We can scale the range of the values from 0–255 down to
a range of 0–1. This scaling process will not affect the algorithm output since every
value is scaled in the same way. But it can speed up the training process, because now
the algorithm only needs to handle numbers less than or equal to 1.
Standardization
Standardization rescales data so that it has a mean of 0 and a standard deviation of 1.
The formula for this is:
(𝑥 − 𝜇)/𝜎
We subtract the mean (𝜇) from each value (x) and then divide by the standard deviation
(𝜎). To understand why this works, it helps to look at an example. Suppose that we have
a sample that contains three data points with the following values:
50
100
150
The mean of our data would be 100, while the sample standard deviation would be 50.
Let's try standardizing each of these data points. The calculations are:
-1
0
1
Again, the result of the standardization is that our data distribution now has a mean of
0 and a standard deviation of 1.
Normalization
Normalization rescales the data into the range [0, 1].
The formula for this is:
(𝑥 −𝑥𝑚𝑖𝑛)/(𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛)
For each individual value, you subtract the minimum value (𝑥𝑚𝑖𝑛) for that input in the
training dataset, and then divide by the range of the values in the training dataset. The
range of the values is the difference between the maximum value (𝑥𝑚𝑎𝑥) and the
minimum value (𝑥𝑚𝑖𝑛).
Let's try working through an example with those same three data points:
50
100
150
The minimum value (𝑥𝑚𝑖𝑛) is 50, while the maximum value (𝑥𝑚𝑎𝑥) is 150. The range of
the values is 𝑥𝑚𝑎𝑥 −𝑥𝑚𝑖𝑛 = 150 − 50 = 100.
Plugging everything into the formula, we get:
0
0.5
1
Again, the goal was to rescale our data into values ranging from 0 to 1—and as you can
see, that's exactly what the formula did.
QUESTION 1 OF 3
QUESTION 2 OF 3
-1, 0.5, 0.8
QUESTION 3 OF 3
SIGUIENTE
Ordinal Encoding
In ordinal encoding, we simply convert the categorical data into integer codes ranging
from 0 to (number of categories – 1). Let's look again at our example table of
clothing products:
SKU Make Color Quantity Price
A&F 0
Gues 1
s
Tillys 2
Color Encoding
Red 0
Gree 1
n
Blue 2
One of the potential drawbacks to this approach is that it implicitly assumes an order
across the categories. In the above example, Blue (which is encoded with a value of 2)
seems to be more than Red (which is encoded with a value of 1), even though this is in
fact not a meaningful way of comparing those values. This is not necessarily a problem,
but it is a reason to be cautious in terms of how the encoded data is used.
One-Hot Encoding
One-hot encoding is a very different approach. In one-hot encoding, we transform
each categorical value into a column. If there are n categorical values, n new columns
are added. For example, the Color property has three categorical values: Red, Green,
and Blue, so three new columns Red, Green, and Blue are added.
If an item belongs to a category, the column representing that category gets the
value 1, and all other columns get the value 0. For example, item 908721 (first row in
the table) has the color blue, so we put 1 into that Blue column for 908721 and 0 into
the Red and Green columns. Item 456552 (second row in the table) has color red, so
we put 1 into that Red column for 456552 and 0 into the Green and Blue columns.
If we do the same thing for the Make property, our table can be transformed as
follows:
Tilly Re
SKU A&F Guess s d Green Blue Quantity Price
One drawback of one-hot encoding is that it can potentially generate a very large
number of columns.
QUESTION 1 OF 4
Mamma
ID l Reptile Fish
012 1 0 0
204 0 0 1
009 0 1 0
105 1 0 0
One-hot encoding
ENVIAR
QUESTION 2 OF 4
Reptile
Fish
ENVIAR
QUESTION 3 OF 4
Animal 303 has 1 in the Mammal column
Animal 303 has 0 in the Bird column
ENVIAR
QUESTION 4 OF 4
John is looking to train his first machine learning model. One of his inputs
includes the size of the T-Shirts, with possible values of XS, S, M, L, and XL.
What is the best approach John can employ to preprocess the T-Shirt size
input feature?
Image Data
Images are another example of a data type that is commonly used as input in machine
learning problems—but that isn't initially in numerical format. So, how do we represent
an image as numbers? Let's have a look.
Encoding an Image
Let's now talk about how we can use this data to encode an image. We need to know
the following three things about an image to reproduce it:
The image has uniform aspect ratio but may need to be normalized.
ENVIAR
QUESTION 2 OF 2
If the image is cropped to half of the original size, it can be encoded by a vector with
the dimension 15*15*2
Text Data
Text is another example of a data type that is initially non-numerical and that must be
processed before it can be fed into a machine learning algorithm. Let's have a look at
some of the common tasks we might do as part of this processing.
Normalization
One of the challenges that can come up in text analysis is that there are often multiple
forms that mean the same thing. For example, the verb to be may show up
as is, am, are, and so on. Or a document may contain alternative spellings of a word,
such as behavior vs. behaviour. So one step that you will sometimes conduct in
processing text is normalization.
Text normalization is the process of transforming a piece of text into a canonical (official)
form.
Lemmatization is an example of normalization. A lemma is the dictionary form of a
word and lemmatization is the process of reducing multiple inflections to that single
dictionary form. For example, we can apply this to the is, am, are example we
mentioned above:
Original
word Lemmatized word
is be
are be
am be
In many cases, you may also want to remove stop words. Stop words are high-
frequency words that are unnecessary (or unwanted) during the analysis. For example,
when you enter a query like which cookbook has the best pancake recipe into
a search engine, the words which and the are far less relevant
than cookbook, pancake, and recipe. In this context, we might want to
consider which and the to be stop words and remove them prior to analysis.
Here's another example:
Here we have tokenized the text (i.e., split each string of text into a list of smaller parts
or tokens), removed stop words (the), and standardized spelling
(changing lazzy to lazy).
QUESTION 1 OF 5
Jack and Jill went up the hill. [Jack, and, Jill, go, up, the, hill]
Looking at the normalized text, which of the following have been done?
Tokenization
Lemmatization
ENVIAR
Vectorization
After we have normalized the text, we can take the next step of actually encoding it in a
numerical form. The goal here is to identify the particular features of the text that will
be relevant to us for the particular task we want to perform—and then get those
features extracted in a numerical form that is accessible to the machine learning
algorithm. Typically this is done by text vectorization—that is, by turning a piece of
text into a vector. Remember, a vector is simply an array of numbers—so there are
many different ways that we can vectorize a word or a sentence, depending on how we
want to use it. Common approaches include:
Term Frequency-Inverse Document Frequency (TF-IDF) vectorization
Word embedding, as done with Word2vec or Global Vectors (GloVe)
The details of these approaches are a bit outside the scope of this class, but let's take a
closer look at TF-IDF as an example. The approach of TF-IDF is to give less importance
to words that contain less information and are common in documents, such as "the"
and "this"—and to give higher importance to words that contain relevant information
and appear less frequently. Thus TF-IDF assigns weights to words that signify their
relevance in the documents.
Here's what the word importance might look like if we apply it to our example
quic
k fox lazy dog rabid hare the
Here's what that might look like if we apply it to the normalized text:
quic
k fox lazy dog rabid hare
Let's pause to make sure this idea is clear. In the table above, what does
the value 0.56 mean?
It means that the word fox has some importance in [quick, fox].
QUESTION 3 OF 5
Feature Extraction
As we talked about earlier, the text in the example can be represented by vectors with
length 6 since there are 6 words total.
[quick, fox] as (0.32, 0.23, 0.0, 0.0, 0.0, 0.0)
[lazy, dog] as (0.0, 0.0, 0.12, 0.23, 0.0, 0.0)
[rabid, hare] as (0.0, 0.0, 0.0 , 0.0, 0.56, 0.12)
We understand the text because each word has a meaning. But how do algorithms
understand the text using the vectors, in other words, how do algorithms extract
features from the vectors?
Apparently, [lazy, fox] is more similar to [lazy, dog] than [rabid, hare], so the vector
distance of [lazy, fox] and [lazy, dog] is smaller than that to [lazy, fox] and [rabid, hare].
QUESTION 4 OF 5
Imagine the words "monkey", "rabbit", "bird" and "raven" are represented
by vectors with the same length. Based on the meanings of the words,
which two words would we expect to have the smallest vector distance?
"monkey" and "rabbit"
Two Perspectives on ML
So the general idea is that we create models and then feed data into these models to
generate outputs. These outputs might be, for example, predictions for future trends
or patterns in the data.
This idea draws on work not only from computer science, but also statistics—and as a
result, you will often see the same underlying machine learning concepts described
using different terms. For example, a computer scientist might say something like:
We are using input features to create a program that can generate the desired output.
In contrast, someone with a background in statistics might be inclined to say something
more like:
We are trying to find a mathematical function that, given the values of the independent
variables can predict the values of the dependent variables.
While the terminology are different, the challenges are the same, that is how to get the
best possible outcome.
QUIZ QUESTION
Can you match the terms below, from the computer science perspective,
with their counterparts from the statistical perspective?
independent variable
dependent variable
function
COMPUTER SCIENCE
STATISTICAL
program
input
output
ENVIAR
In the end, having an understanding of the underlying concepts is more important than
memorizing the terms used to describe those concepts. However, it's still essential to
be familiar with the terminology so that you don't get confused when talking with
people from different backgrounds.
Over the next couple of pages, we'll take a look at these two different perspectives and
get familiar with some of the related terminology.
What are some of the terms we can use to describe this data?
For the rows in the table, we might call each row an entity or an observation about an entity. In
our example above, each entity is simply a product, and when we speak of an observation, we are
simply referring to the data collected about a given product. You'll also sometimes see a row of
data referred to as an instance, in the sense that a row may be considered a single example (or
instance) of data.
For the columns in the table, we might refer to each column as a feature or attribute which
describes the property of an entity. In the above
example, color and quantity are features (or attributes) of the products.
1 Jake Cat 3
2 Bailey Dog 7
3 Jenna Dog 4
4 Marco Cat 12
Which of the following terms might we use to refer to the part of the table that is
highlighted?
(Select all that apply.)
A row
An attribute
An entity
An instance
An input vector
A feature
ENVIAR
QUESTION 2 OF 2
I
D Name Species Age
1 Jake Cat 3
2 Bailey Dog 7
3 Jenna Dog 4
4 Marco Cat 12
Which of the following terms might we use to refer to the part of the table that is
highlighted?
(Select all that apply.)
A column
An attribute
An entity
An instance
What are some of the terms we can use to describe this data?
For the rows in the table, we might call each row an entity or an observation about an
entity. In our example above, each entity is simply a product, and when we speak of
an observation, we are simply referring to the data collected about a given product.
You'll also sometimes see a row of data referred to as an instance, in the sense that a
row may be considered a single example (or instance) of data.
For the columns in the table, we might refer to each column as
a feature or attribute which describes the property of an entity. In the above
example, color and quantity are features (or attributes) of the products.
I
D Name Species Age
1 Jake Cat 3
2 Bailey Dog 7
3 Jenna Dog 4
4 Marco Cat 12
Which of the following terms might we use to refer to the part of the table
that is highlighted?
An attribute
An entity
An instance
An input vector
A feature
ENVIAR
QUESTION 2 OF 2
1 Jake Cat 3
2 Bailey Dog 7
3 Jenna Dog 4
4 Marco Cat 12
Which of the following terms might we use to refer to the part of the table
that is highlighted?
An attribute
An entity
An instance
A feature
ENVIAR
SIGUIENTE
Statistical terminology
In statistics, you'll also see the data described in terms of independent
variables and dependent variables. These names come from the idea that the value
of one variable may depend on the value of some other variables. For example, the
selling price of a house is the dependent variable that depends on some independent
variables—like the house's location and size.
In the example of clothing products we looked at earlier in this lesson:
We might use data in each row (e.g. (908721, Guess, Blue, 789, 45.33)) to
predict the sale of the corresponding item. Thus, the sale of each item is dependent on
the data in each row. We can call the data in each row the independent variables and
call the sale the dependent variable.
1. Libraries. When you're working on a machine learning project, you likely will not
want to write all of the necessary code yourself—instead, you'll want to make use of
code that has already been created and refined. That's where libraries come in.
A library is a collection of pre-written (and compiled) code that you can make use of in
your own project. NumPy is an example of a library popularly used in data science,
while TensorFlow is a library specifically designed for machine learning. Read
this article for some other useful library.
2. Development environments. A development environment is a software application
(or sometimes a group of applications) that provide a whole suite of tools designed to
help you (as the developer or machine learning engineer) build out your
projects. Jupyter Notebooks and Visual Studio are examples of development
environments that are popular for coding many different types of projects, including
machine learning projects.
3. Cloud services. A cloud service is a service that offers data storage or computing
power over the Internet. In the context of machine learning, you can use a cloud
service to access a server that is likely far more powerful than your own machine, or
that comes equipped with machine learning models that are ready for you to use. Read
more information about different cloud services from this article
For each of these components, there are multiple options you can choose from. Let's
have a look at some examples.
Notebooks
Notebooks are originally created as a documenting tool that others can use to
reproduce experiments. Notebooks typically contain a combination of runnable code,
output, formatted text, and visualizations. One of the most popular open-source
notebooks used today by data scientists and data science engineers is Jupyter
notebook, which can combine code, formatted text (markdown) and visualization.
Notebooks contains several independent cells that allow for the execution of code
snippets within those cells. The output of each cell can be saved in the notebook and
viewed by others.
SIGUIENTE
Libraries for Machine Learning
For your reference, here are all the libraries we went over in the video. This is a lot of
info; you should not feel like you need to be deeply knowledgable about every detail of
these libraries. Rather, we suggest that you become familiar with what each library
is for, in general terms. For example, if you hear someone talking about matplotlib, it
would be good for you to recognize that this is a popular library for data visualization.
Or if you see a reference to TensorFlow, it would be good to recognize this as a popular
machine learning library.
Core Framework and Tools
Python is a very popular high-level programming language that is great for data
science. Its ease of use and wide support within popular machine learning platforms,
coupled with a large catalog of ML libraries, has made it a leader in this space.
Pandas is an open-source Python library designed for analyzing and
manipulating data. It is particularly good for working with tabular data and time-series
data.
NumPy, like Pandas, is a Python library. NumPy provides support for large,
multi-dimensional arrays of data, and has many high-level mathematical functions that
can be used to perform operations on these arrays.
Machine Learning and Deep Learning
Scikit-Learn is a Python library designed specifically for machine learning. It is
designed to be integrated with other scientific and data-analysis libraries, such
as NumPy, SciPy, and matplotlib (described below).
Apache Spark is an open-source analytics engine that is designed for cluster-
computing and that is often used for large-scale data processing and big data.
TensorFlow is a free, open-source software library for machine learning built
by Google Brain.
Keras is a Python deep-learning library. It provide an Application Programming
Interface (API) that can be used to interface with other libraries, such as TensorFlow, in
order to program neural networks. Keras is designed for rapid development and
experimentation.
PyTorch is an open source library for machine learning, developed in large part
by Facebook's AI Research lab. It is known for being comparatively easy to use,
especially for developers already familiar with Python and a Pythonic code style.
Data Visualization
Plotly is not itself a library, but rather a company that provides a number of
different front-end tools for machine learning and data science—including an open
source graphing library for Python.
Matplotlib is a Python library designed for plotting 2D visualizations. It can be
used to produce graphs and other figures that are high quality and usable in
professional publications. You'll see that the Matplotlib library is used by a number of
other libraries and tools, such as SciKit Learn (above) and Seaborn (below). You can
easily import Matplotlib for use in a Python script or to create visualizations within a
Jupyter Notebook.
Seaborn is a Python library designed specifically for data visualization. It is based
on matplotlib, but provides a more high-level interface and has additional features for
making visualizations more attractive and informative.
Bokeh is an interactive data visualization library. In contrast to a library like
matplotlib that generates a static image as its output, Bokeh generates visualizations in
HTML and JavaScript. This allows for web-based visualizations that can have interactive
features.
QUIZ QUESTION
Below are some of the libraries we just went over. See if you can match
each library with its main focus.
Machine learning
Data visualization
Data visualization
Analyzing/manipulating data
Machine learning
LIBRARY
WHAT IS IT FOR?
TensorFlow
Matplotlib
Pandas
PyTorch
Bokeh
ENVIAR
SIGUIENTE