2-ML
2-ML
• In this example the system was not explicitly trained to predict the
safe internal temperature of chicken drumstick.
• The system used the existing data and knowledge to fill in data gaps.
Natural Language Processing
• Natural Language Processing (NLP) is the ability of a computer
program to understand human language both written text and human
speech.
• The idea of giving computers the ability to process human language is
as old as the idea of the computer themselves.
• The goal of the NLP is to get computers perform tasks involving
human language.
Natural Language Processing
• It includes tasks like enabling human-machine communication,
improving human-human communication, or simply doing useful
processing of text or speech.
• There are two main reasons why we want our computers to process
the natural languages: first, to communicate with humans, and
second, to acquire information from written language
Natural Language Processing
• There are over a trillion of pages on web.
• Almost all of them in human language.
• Computer program that wants to do knowledge acquisition, needs to
understand ambiguous, messy language that human use.
• We use information seeking tasks such as text classification,
information retrieval, and information extraction.
• We use language models to address the tasks.
• Language models predict the probability distribution of language
expressions.
Planning
• Planning is about how an agent achieves its goals.
• To achieve anything but the simplest goals, an agent must reason
about its future.
• Because an agent does not usually achieve its goals in one step, what
it should do at any time depends on what it will do in the future.
• What it will do in the future depends on the state it is in, which, in
turn, depends on what it has done in the past.
Planning
• Automated planning is the ability of the intelligent system to act
autonomously and flexibly to construct the sequence of action to
reach the final goal.
• Rather than a pre-programmed decision-making process that goes
from A to B to C to reach a final output, automated planning is
complex and requires a system to adapt based on the context
surrounding the given challenge.
Types of Machine Learning
Types of Machine Learning
• Learning is the ability of an agent to improve its behavior based on
experience. This could mean the following:
• The range of behaviors is expanded; the agent can do more.
• The accuracy on tasks is improved; the agent can do things better.
• The speed is improved; the agent can do things faster.
Types of Machine Learning
• There are four main types of learning
• Supervised learning
• Unsupervised learning
• Self-supervised learning
• Reinforcement learning
Supervised Learning
• In predictive or supervised learning, the agent observes some
example input-output pairs and learn a function that map input and
output.
Given a training set of 𝑁example input-output pairs
𝑥1 , 𝑦1 , 𝑥2 , 𝑦2 , 𝑥3 , 𝑦3 , … , 𝑥𝑁 , 𝑦𝑁 ,
Where each 𝑦𝑗 was generated by unknown function 𝑦 = 𝑓 𝑥 , discover a
function ℎ that approximates the true function 𝑓.
Supervised Learning
• Here 𝑥 and 𝑦 can be of any value not necessarily be the numbers.
• Each training input 𝑥𝑗 can also represent the height and weight of a person.
• These are called features, attributes, or covariates.
• The 𝑥𝑗 can also be a complex structured object such as an image, a sentence, an
email message, a time series, a molecular shape, a graph etc.
• The function ℎ is a hypothesis.
• Learning is the search through the space of possible hypothesis for one that will
perform well, even on new examples that are beyond the training set.
• To measure the accuracy of the training set, we give the test set of examples that
are separate from the training set.
• We say that the hypothesis generalizes well, if it correctly predicts the value of 𝑦
for new examples.
Supervised Learning
• When the output 𝑦 is one among the finite set of categorical or
nominal values (such as sunny, cloudy, or rainy), the learning problem
is called classification or pattern recognition.
• When 𝑦 is a number (such as income level), the learning problem is
called regression.
Supervised Learning
• Fig shows the example of fitting a function of a single variable to some data
points. (a) Example (𝑥, 𝑓(𝑥)) pairs and a consistent, linear hypothesis. (b) A
consistent, degree-7 polynomial hypothesis for the same data set. (c) A
different data set, which admits an exact degree-6 polynomial fit or an
approximate linear fit. (d) A simple, exact sinusoidal fit to the same data
set.
Supervised Learning
• The fig shows the example of fitting a function of a single variable to
some data points.
• The example of the points in the (𝑥, 𝑦) plane where 𝑦 = 𝑓(𝑥).
• We do not know what 𝑓 is, but we will approximate it with a
function ℎ selected from hypothesis space.
• Fig (a) shows some data with an exact fit by a straight line (the
polynomial 0.4𝑥 + 3).
• The line is called consistent hypothesis because it agrees with all the
data.
Supervised Learning
• Figure (b) shoes a high degree polynomial that is also consistent with the
same data.
• This illustrates the fundamental problem: how do we choose from among
multiple consistent hypothesis?
• One answer is to prefer the simplest hypothesis consistent with the data.
• This principle is known as Ockham’s razor, after 14th century English
philosopher William of Ockham, who used it to argue against all sorts of
complications.
• Since degree-1 polynomial is simpler than degree-7 polynomial, so (a)
should be preferred over (b).
Supervised Learning
• Figure (c) shows a second data set.
• There is no consistent straight line for this data set.
• It requires degree-6 polynomial for an exact fit.
• There are just 7 data points, so a polynomial with 7 data points does
not find any pattern in the data, so we do not expect it to generalize
well.
• A straight line that is not consistent with any of the data points, may
generalize well for the unseen values of 𝑥.
Supervised Learning
• In general, there is a trade-off between complex hypothesis that fit
the training data well and simpler hypothesis that may generalize
better.
• In figure (d), we expand the hypothesis space to allow polynomials
over both 𝑥 and 𝑠𝑖𝑛(𝑥), and find that the data in (c) can be fitted
better by a simpler function of the form 𝑎𝑥 + 𝑏 + 𝑐𝑠𝑖𝑛(𝑥). This
shows the importance of hypothesis space.
• The learning problem is realizable if the hypothesis space contains
the true function. Unfortunately, we cannot always tell whether the
given learning problem is realizable, because the true function is not
known
Unsupervised Learning
• The second main type of machine learning is descriptive or unsupervised
learning approach.
• Here we are only given a training set of 𝑁 features or inputs 𝑥1 , 𝑥2 , … , 𝑥𝑁 .
• We are not interested in prediction, because we do not have associated
response variable 𝑦.
• Rather the goal is to identify the interesting things about the
measurements on 𝑥1 , 𝑥2 , … , 𝑥𝑁 .
• Is there an informative way to visualize the data?
• Can we discover subgroups among the variables or among the
observations?
Unsupervised Learning
• Unsupervised learning refers to a diverse set of techniques to answer such
questions.
• The goal of the unsupervised learning is to find “interesting patterns” in the data.
• This is also known as knowledge discovery.
• The purpose of unsupervised learning is data visualization, data compression, or
data denoising, or to better understand the correlation present in the data at
hand.
• Two most common unsupervised learning types are principal component
analysis, a tool used for data visualization or data pre-processing before
supervised techniques are applied and clustering, that consists of dividing the
dataset into clusters of similar examples.
• Unsupervised learning is the bread and butter of data analytics, and it’s often a
necessary step in better understanding a dataset before attempting to solve the
supervised learning problem.
Self - supervised Learning
• This is a specific instance of supervised learning, but it is different
enough that it deserves its own category.
• Self-supervised learning is supervised learning without human-
annotated label or response variable 𝑦.
• You can think of it as supervised learning without any human in the
loop.
• There are still labels or response variables involved (because the
learning must be supervised by something), but they are generated
from the input data, typically using a heuristic algorithm. Input data
can be labelled by finding and exploiting the relations (or
correlations) between different input signals.
Self – supervised Learning
• For instance, autoencoders are a well-known instance of self-
supervised learning, that learns to copy its input to its output.
• The purpose of the autoencoder is to reconstruct its inputs by
minimizing the difference between the input and the output instead
of predicting the target value Y given inputs X.
• Therefore, autoencoders do not require labelled inputs to enable
learning.
• In the same way, trying to predict the next frame in a video, given
past frames, or the next word in a text, given previous words, are
instance of self-supervised learning (temporally supervised learning,
in this case: supervision comes from future input data).
Reinforcement Learning
• In reinforcement learning, an agent receives information about its
environment and learns to choose actions that will maximize some
rewards or reinforcement.
• In a reinforcement learning problem, a robot can act in a world,
receiving rewards and punishments and determining from these what
it should do.
• Reinforcement learning differs from other types of supervised
learning because the system isn’t trained with the sample data set.
• Rather, the system learns through trial and error. Therefore, a
sequence of successful decisions will result in the process being
“reinforced” because it best solves the problem at hand.
Reinforcement Learning
• Consider, for example, the problem of learning to play chess.
• A supervised learning agent needs to be told the correct move for each
position it encounters, but such feedback is seldom available.
• In the absence of feedback from a teacher, an agent can learn a transition
model for its own moves and can perhaps learn to predict the opponent’s
moves, but without some feedback about what is good and what is bad,
the agent will have no grounds for deciding which move to make.
• The agent needs to know that something good has happened when it
(accidentally) checkmates the opponent, and that something bad has
happened when it is checkmated—or vice versa, if the game is suicide
chess.
Reinforcement Learning
• This kind of feedback is called a reward, or reinforcement.
• In games like chess, the reinforcement is received only at the end of
the game.
• In other environments, the rewards come more frequently.
• In ping-pong, each point scored can be considered a reward; when
learning to crawl, any forward motion is an achievement.
Reinforcement Learning
• One of the most common applications of reinforcement learning is in
robotics or game playing.
• Take the example of the need to train a robot to navigate a set of stairs.
• The robot changes its approach to navigating the terrain based on the
outcome of its actions.
• When the robot falls, the data is recalibrated, so the steps are navigated
differently until the robot is trained by trial and error to understand how to
climb stairs.
• In other words, the robot learns based on a successful sequence of actions.
Reinforcement Learning
• Reinforcement learning is also the algorithm that is being used for self-driving
cars.
• In many ways, training a self-driving car is incredibly complex because there are
so many potential obstacles.
• If all the cars on the road were autonomous, trial and error would be easier to
overcome.
• However, in the real world, human drivers can often be unpredictable.
• Even with this complex scenario, the algorithm can be optimized over time to find
ways to adapt to the state where actions are rewarded.
• One of the easiest ways to think about reinforcement learning is the way an
animal is trained to take actions based on rewards.
• If the dog gets a treat every time he sits on command, he will take this action
each time.
Type of Machine Learning
Algorithms
Type of Machine Learning Algorithms
• Selecting the right machine learning algorithms is part art and part
science.
• Two data scientists can use the two different machine learning
algorithms solving the same business problem using the same data
sets.
• Hence, understanding different machine learning algorithms help the
data scientists select the best types of algorithms to solve a given
business problem
Regression vs Classification
• Variables can be characterised as either quantitative or qualitative
(also known as categorical).
• The quantitative variables take on numerical values. Example includes
a person’s age, height, or income.
• The qualitative variables take on values in one of K different classes or
categories.
• Example of qualitative variables include a person’s gender (male or
female), the brand of the product purchased (brand X, Y, or Z),
whether the person defaults on a debt (yes or no), or a cancer
diagnosis (Acute Myelogenous Leukemia, Acute Lymphoblastic
Leukemia, or No Leukemia).
Regression vs Classification
• The problems with quantitative response are referred as regression
problems.
• Those involved in qualitative response are referred as classification
problems.
• However, the distinction is always not very crisp.
• Least square linear regression is used with a quantitative response.
• Logistic regression is typically used for qualitative (two-class or
binary) response.
Regression vs Classification
• We tend to select the learning model on the basis whether the
response is quantitative or qualitative.
• However, whether the predictors are quantitative or qualitative is
considered less important.
Bayesian
• Bayesian algorithms help the data analysts encode their prior beliefs about
what the model look like, independent of the data set.
• These algorithms are especially useful when you do not have massive
amount of data to train the model confidently.
• The Bayesian algorithm would be helpful if you have prior knowledge to
some part of the model and you can code that directly.
• For example, if you want to model a medical imaging diagnosis system that
looks for lung disorder.
• If a published journal study estimates the probability of different lung
disorders based on lifestyle, those probabilities can be encoded into the
model.
Clustering
• In clustering objects with similar parameters are grouped together in
a cluster.
• All objects in a cluster are more like each other than objects in other
cluster.
• The clustering is a type of unsupervised learning because the data is
not labelled.
• The clustering algorithm interprets the parameter that make up each
item and then groups them accordingly.
Decision Tree
• Decision tree algorithms uses a tree-like graph or branching structure
to illustrate the event outcomes, resource costs, and utility.
• The decision tree is a flowchart-like structure in which each internal
node represents a "test" on an attribute (e.g. whether the object is
cat or a dog), each branch represents the outcome of the test.
• Each leaf node represents a possible outcome.
• The paths from root to leaf represent classification rule.
• Percentages are assigned to nodes based on the likelihood of the
outcome occurring.
Decision Tree
• Decision tree algorithms are the one of
the most widely used supervised
learning methods.
• Tree based algorithms empower
predictive models with high accuracy,
stability, and ease of interpretation.
• They can easily solve both classification
and regression problems.
• Decision Tree algorithms are referred to
as CART (Classification and Regression
Trees).
Dimensionality Reduction
• Dimensionality reduction helps systems remove data that’s not useful
for analysis.
• This group of algorithms is used to remove redundant data, outliers,
and other non-useful data.
• Dimensionality reduction can be helpful when analyzing data from
sensors and other Internet of Things (IoT) use cases.
Dimensionality Reduction
• In IoT systems, there might be thousands of data points simply telling
you that a sensor is turned on.
• Storing and analyzing that “on” data is not helpful and will occupy
important storage space.
• In addition, by removing this redundant data, the performance of a
machine learning system will improve.
• Finally, dimensionality reduction will also help analysts visualize the
data.
Neural Network and Deep Learning
• A neural network attempts to mimic the way a human brain
approaches problem and uses layers of interconnected units to learn
and infer relationships based on observed data.
• A neural network can have several connected layers.
• When there is more than one hidden layer in a neural network, it is
sometimes called deep learning.
• Neural network models can adjust and learn as data changes.
• Neural networks are often used when data is unlabeled or
unstructured.
• One of the key use cases for neural networks is computer vision.
Neural Network and Deep Learning