Artificial Intelligence - Unit 1 - 5
Artificial Intelligence - Unit 1 - 5
Expert Systems
➢ An expert system is a computer program that is designed to
solve complex problems and to provide decision-making
ability like a human expert.
(Or)
➢ The expert systems are the computer applications
developed to solve complex problems in a particular
domain, at the level of extra-ordinary human intelligence
and expertise.
➢ It performs this by extracting knowledge from its
knowledge base using the reasoning and inference rules
according to the user queries.
➢ The expert system is a part of AI, and the first ES was
developed in the year 1970, which was the first successful
approach of artificial intelligence.
Areas of Artificial Intelligence
Expert Systems
➢ Expert Systems solves the most complex issue as an expert
by extracting the knowledge stored in its knowledge base.
The system helps in decision making for complex problems
using both facts and heuristics like a human expert.
➢ It is called so because it contains the expert knowledge of a
specific domain and can solve any complex problem of that
particular domain. These systems are designed for a
specific domain, such as medicine, science, etc.
➢ The performance of an expert system is based on the
expert's knowledge stored in its knowledge base. The more
knowledge stored in the KB, the more that system improves
its performance.
➢ One of the common examples of an ES is a suggestion of
spelling errors while typing in the Google search box.
Characteristics of Expert Systems
➢ High performance: The expert system provides high
performance for solving any type of complex problem
of a specific domain with high efficiency and accuracy.
➢ Understandable: It responds in a way that can be easily
understandable by the user. It can take input in human
language and provides the output in the same way.
➢ Reliable: It is much reliable for generating an efficient
and accurate output.
➢ Highly responsive: ES provides the result for any
complex query within a very short period of time.
Capabilities of Expert Systems
The expert systems are capable of −
➢ Advising
➢ Instructing and assisting human in decision making
➢ Demonstrating
➢ Deriving a solution
➢ Diagnosing
➢ Explaining
➢ Interpreting input
➢ Predicting results
➢ Justifying the conclusion
➢ Suggesting alternative options to a problem
In capabilities of Expert Systems
They are incapable of −
➢ Substituting human decision makers
➢ Possessing human capabilities
➢ Producing accurate output for inadequate knowledge base
➢ Refining their own knowledge
Components of Expert Systems
The components of ES include −
➢ Knowledge Base
➢ Inference Engine
➢ User Interface
Components of Expert Systems
Knowledge Base:
➢ It contains domain-specific and high-quality knowledge.
➢ Knowledge is required to exhibit intelligence. The success
of any ES majorly depends upon the collection of highly
accurate and precise knowledge.
What is Knowledge?
➢ The data is collection of facts. The information is organized
as data and facts about the task domain. Data,
information, and past experience combined together are
termed as knowledge.
Components of Expert Systems
Components of Knowledge Base:
The knowledge base of an ES is a store of both, factual and
heuristic knowledge.
➢ Factual Knowledge − It is the information widely accepted
by the Knowledge Engineers and scholars in the task
domain.
➢ Heuristic Knowledge − It is about practice, accurate
judgement, one’s ability of evaluation, and guessing.
Components of Expert Systems
Inference Engine(Rules of Engine)
➢ The inference engine is known as the brain of the expert
system as it is the main processing unit of the system. It
applies inference rules to the knowledge base to derive a
conclusion or deduce new information. It helps in deriving
an error-free solution of queries asked by the user.
➢ With the help of an inference engine, the system extracts
the knowledge from the knowledge base.
The knowledge engineer and the domain expert usually work very
closely together for long periods of time throughout the several
stages of the development process.
1.Identification Phase
➢ To begin, the knowledge engineer, who may be unfamiliar with
this particular domain, consults manuals and training guides to
gain some familiarity with the subject. Then the domain expert
describes several typical problem states.
➢ The knowledge engineer attempts to extract fundamental concepts
from the similar cases in order to develop a more general idea of
the purpose of the expert system.
➢ After the domain expert describes several cases, the knowledge
engineer develops a ‘first-pass’ problem description.
➢ Typically, the domain expert may feel that the description does
not entirely represent the problem.
➢ The domain expert then suggests changes to the description and
provides the knowledge engineer with additional examples to
illustrate further the problem’s fine points.
1.Identification Phase
Next, the knowledge engineer revises the description, and the domain
expert suggests further changes. This process is repeated until the
domain expert is satisfied that the knowledge engineer understands
the problems and until both are satisfied that the description
adequately portrays the problem which the expert system is expected
to solve.
2.Conceptualisation Phase
In the conceptualisation stage, the knowledge engineer
frequently creates a diagram of the problem to depict
graphically the relationships between the objects and processes
in the problem domain.
It is often helpful at this stage to divide the problem into a series
of sub-problems and to diagram both the relationships among
the pieces of each sub-problem and the relationships among the
various sub-problems.
As in the identification stage, the conceptualisation stage
involves a circular procedure of iteration and reiteration
between the knowledge engineer and the domain expert. When
both agree that the key concepts-and the relationships among
them-have been adequately conceptualised, this stage is
complete.
2.Conceptualisation Phase
Examples:
➢In a drawer of ten socks where 8 of them are yellow, there is a
20% chance of choosing a sock that is not yellow.
➢There are 9 red candies in a bag and 1 blue candy in the same
bag. The chance of picking a blue candy is 10%.
P(¬A) = probability of a not happening event.
P(¬A) + P(A) = 1.
➢ Event: Each possible outcome of a variable is called an event.
➢ Sample space: The collection of all possible events is called
sample space.
➢ Random variables: Random variables are used to represent
the events and objects in the real world.
➢ Prior probability: The prior probability of an event is
probability computed before observing new information.
➢ Posterior Probability: The probability that is calculated after
all evidence or information has taken into account. It is a
combination of prior probability and new information.
Conditional Probability
➢Conditional probability is a probability of occurring an
event when another event has already happened.
Hence, 57% are the students who like C also like Java.
Prior Probability
Prior Probability- Degree of belief in an event, in the
absence of any other information
Example:
➢ P(rain tomorrow)= 0.7
➢ P(no rain tomorrow)= 0.3 Rain
No
No rain
rain
Conditional Probability
What is the probability of an event , given knowledge of
another event.
Example:
➢ P(raining | sunny)
➢ P(raining | cloudy)
➢ P(raining | cloudy, cold)
Conditional Probability…
In some cases , given knowledge of one or more random
variables, we can improve our prior belief of another
random variable.
For Example:
➢ P(slept in stadium) = 0.5
➢ P(slept in stadium | liked match) = 0.33
➢ P(didn’t slept in stadium | liked match) = 0.67
Bayes Theorem
➢ Bayes' theorem is also known as Bayes' rule, Bayes' law,
or Bayesian reasoning, which determines the probability
of an event with uncertain knowledge.
➢ In probability theory, it relates the conditional probability
and marginal probabilities of two random events.
➢ Bayes' theorem was named after the British
mathematician Thomas Bayes.
➢ The Bayesian inference is an application of Bayes'
theorem, which is fundamental to Bayesian statistics.
➢ It is a way to calculate the value of P(B|A) with the
knowledge of P(A|B).
Bayes Theorem …
➢ Bayes' theorem allows updating the probability prediction
of an event by observing new information of the real
world.
➢ Example: If cancer corresponds to one's age then by using
Bayes' theorem, we can determine the probability of cancer
more accurately with the help of age.
➢ Bayes' theorem can be derived using product rule and
conditional probability of event A with known event B:
As from product rule we can write:
P(A ∧ B) = P(A|B)P(B) and
Similarly, the probability of event B with known event A:
P(A ∧ B) = P(B|A)P(A)
Bayes Theorem …
Equating right hand side of both the equations, we will get:
A1 A2 A3
( 0.00025 )( 0.4 )
=
( 0.00025 )( 0.4 ) + ( 0.0002 )( 0.6 )
= 0.4545
Bayesian Networks
➢ Bayesian belief network is key computer technology for
dealing with probabilistic events and to solve a problem
which has uncertainty. We can define a Bayesian network
as:
➢ "A Bayesian network is a probabilistic graphical model
which represents a set of variables and their conditional
dependencies using a directed acyclic graph."
➢ It is also called a Bayes network, belief network, decision
network, or Bayesian model.
Bayesian Networks
➢ Bayesian networks are probabilistic, because these networks
are built from a probability distribution, and also use
probability theory for prediction and anomaly detection.
➢ Real world applications are probabilistic in nature, and to
represent the relationship between multiple events, we need a
Bayesian network. It can also be used in various tasks
including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction,
and decision making under uncertainty.
➢ Bayesian Network can be used for building models from data
and experts opinions, and it consists of two parts:
Directed Acyclic Graph
Table of conditional probabilities.
A Bayesian network graph (Directed Acyclic Graph) is made up
of nodes and Arcs (directed links), where:
➢ Each node corresponds to the random
variables, and a variable can
be continuous or discrete.
➢ Arc or directed arrows represent the causal
relationship or conditional probabilities between
random variables.
➢ These directed links or arrows connect the
pair of nodes in the graph.
➢ These links represent that one node directly
influence the other node, and if there is no
directed link that means that nodes are
independent with each other
➢In the above diagram X1, X2,X3 and X4 are random variables represented
by the nodes of the network graph.
➢If we are considering node X3, which is connected with node X1 by a
directed arrow, then node X1 is called the parent of Node X3.
➢Node X4 is independent of node X1.
Conditional Probability Tables- CPTs
➢ The conditional probability tables in the network give the
probabilities for the value of the random variables
depending on the combination of values for the parent
nodes.
➢ Each row must be sum to 1.
➢ All variables are Boolean, and therefore, the probability of
a true value is p, the probability of false value must be 1-p.
➢ A table for a boolean variable with k-parents contains 2k
independently specifiable probabilities.
➢ A variable with no parents has only one row, representing
the prior probabilities of each possible values of the
variable.
Joint Probability Distribution
➢Bayesian network is based on Joint probability distribution and
conditional probability. So let's first understand the joint
probability distribution:
P(J|A) P(M|A)
A 0.90 (0.1) A 0.70 (0.30)
P(j|A)P(m|A)
A 0.9 * 0.7
-A 0.05 * 0.01
Inference by Variable Elimination…
α P(B)ΣE P(E) ΣA P(A|E,B)f1(A)
P(J|A) P(M|A)
A 0.90 (0.1) A 0.70 (0.30)
f1 (A)
A 0.63
-A 0.0005
Inference by Variable Elimination…
α P(B)ΣE P(E) ΣA P(A|E,B)f1(A)
P(A|E,B)
e,b 0.95 (0.05)
f1 (A)
e , -b 0.29 (0.71)
A 0.63
-e , b 0.94(0.06)
-A 0.0005
-e , -b 0.001(0.999)
ΣAP(A|E,B) f1(A)
e,b 0.95 * 0.63 + 0.05 * 0.0005
-e , b 0.94(0.06)
-A 0.0005
-e , -b 0.001(0.999)
f2(E,B)
e,b 0.60
e , -b 0.18
-e , b 0.59
-e , -b 0.001
Inference by Variable Elimination…
α P(B)ΣE P(E) f2(E,B)
f2(E,B)
e,b 0.60
P(E=T) P(E=F) P(B=T) P(B=F)
0.002 0.998 0.001 0.999 e , -b 0.18
-e , b 0.59
-e , -b 0.001
-e , b 0.59
-e , -b 0.001
f3(B)
b 0.0006
-b 0.0013
Inference by Variable Elimination…
α f3(B) → P(B | j , m)
f3(B)
b 0.0006
-b 0.0013
P(B | j , m)
b 0.32
-b 0.68
CERTAINLY YES
POSSIBLY YES
CANNOT SAY
POSSIBLY NO
CERTAINLY NO
Implementation
It can be implemented in systems with various sizes and
capabilities ranging from small micro-controllers to large,
networked, workstation-based control systems.
It can be implemented in hardware, software, or a combination
of both.
Slow Fast
Speed = 0 Speed = 1
bool speed;
get the speed
if ( speed == 0) {
// speed is slow
}
else {
// speed is fast
}
FUZZY LOGIC REPRESENTATION
Slowest
• For every problem
[ 0.0 – 0.25 ]
must represent in
terms of fuzzy sets.
Slow
• What are fuzzy sets? [ 0.25 – 0.50 ]
Fast
[ 0.50 – 0.75 ]
Fastest
[ 0.75 – 1.00 ]
FUZZY LOGIC REPRESENTATION
CONT.
Example:
Let's suppose A is a set which contains following elements:
A = {( X1, 0.3 ), (X2, 0.7), (X3, 0.5), (X4, 0.1)}
then,
μĀ(x) = 1-μA(x),
Example:
Let's suppose A is a set which contains following elements:
Inference Engine:
It helps you to determines the degree of match between fuzzy
input and the rules. Based on the % match, it determines which
rules need implment according to the given input field. After
this, the applied rules are combined to develop the control
actions.
Fuzzy Logic Systems Architecture
Defuzzification:
At last the Defuzzification process is performed to convert the
fuzzy sets into a crisp value. There are many types of
techniques available, so you need to select it which is best
suited when it is used with an expert system.
Fuzzy logic algorithm
1) Initialization process:
▪ Define the linguistic variables.
▪ Construct the fuzzy logic membership functions that
define the meaning or values of the input and output
terms used in the rules.
▪ Construct the rule base (Break down the control problem
into a series of IF X AND Y, THEN Z rules based on the
fuzzy logic rules).
2)Convert crisp input data to fuzzy values using the
membership functions (fuzzification).
3) Evaluate the rules in the rule base (inference).
4) Combine the results of each rule (inference).
5)Convert the output data to non-fuzzy values
(defuzzification).
Example: Air conditioner system
controlled by a FLS
Example: Air conditioner system
controlled by a FLS
The system adjusts the temperature of the room according
to the current temperature of the room and the target value.
The fuzzy engine periodically compares the room
temperature and the target temperature, and produces a
command to heat or cool the room.
RoomTemp.
Too Cold Cold Warm Hot Too Hot
/Target
Too Cold No_Change Heat Heat Heat Heat
0, if x < a
(x – a) / (b- a), if a ≤ x ≤ b
F(x, a, b, c) =
(c – x) / (c – b), if b ≤ x ≤ c
0, if c < x
Cont…
1.2
1
Membership Values
0.8
0.6
0.4
0.2
a b c
0
0 20 40 60 80 100
0, if x < a
(x – a) / (b- a), if a ≤ x ≤ b
F(x, a, b, c, d) = 1, if b < x < c
(d – x) / (d – c), if c ≤ x ≤ d
0, if d < x
Trapezoidal membership function
0, if x < a
(x – a) / (b- a), if a ≤ x ≤ b
F(x, a, b, c, d) = 1, if b < x < c
(d – x) / (d – c), if c ≤ x ≤ d
0, if d < x
Cont…
Gaussian membership function
− ( x −b ) 2
( x, a, b) = e 2a2
a b
Figure Gaussian Membership Function
Applications of Fuzzy Logic
Following are the different application areas where the Fuzzy Logic
concept is widely used:
It is used in Businesses for decision-making support system.
It is used in Automative systems for controlling the traffic and
speed, and for improving the efficiency of automatic transmissions.
Automative systems also use the shift scheduling method for
automatic transmissions.
This concept is also used in the Defence in various areas. Defence
mainly uses the Fuzzy logic systems for underwater target
recognition and the automatic target recognition of thermal infrared
images.
It is also widely used in the Pattern Recognition and
Classification in the form of Fuzzy logic-based recognition and
handwriting recognition. It is also used in the searching of fuzzy
images.
Applications of Fuzzy Logic
Fuzzy logic systems also used in Securities.
It is also used in microwave oven for setting the lunes power and
cooking strategy.
This technique is also used in the area of modern control
systems such as expert systems.
Finance is also another application where this concept is used for
predicting the stock market, and for managing the funds.
It is also used for controlling the brakes.
It is also used in the industries of chemicals for controlling the ph,
and chemical distillation process.
It is also used in the industries of manufacturing for the
optimization of milk and cheese production.
It is also used in the vacuum cleaners, and the timings of washing
machines.
It is also used in heaters, air conditioners, and humidifiers.
Utility Theory and utility functions
Decision theory, in its simplest form, deals with choosing
among actions based on the desirability of their immediate
outcomes
If agent may not know the current state and define
RESULT(a) as a random variable whose values are the
possible outcome states. The probability of outcome s ,
given evidence observations e, is written
P(RESULT(a) = s ‘| a, e)
where the a on the right-hand side of the conditioning bar
stands for the event that action a is executed
The agent’s preferences are captured by a utility function,
U(s), which assigns a single number to express the
desirability of a state.
The expected utility of an action given the evidence, EU
(a|e), is just the average utility value of the outcomes,
weighted by the probability that the outcome occurs:
EU (a|e) = ∑ P(RESULT(a) = s’ | a, e)U(s’)
The principle of maximum expected utility (MEU) says that a
rational agent should choose the action that maximizes the
agent’s expected utility:
action = argmax EU (a|e)
In a sense, the MEU principle could be seen as defining all of
AI. All an intelligent agent has to do is calculate the various
quantities, maximize utility over its actions, and away it goes.
Basis of Utility Theory
Intuitively, the principle of Maximum Expected Utility
(MEU) seems like a reasonable way to make decisions, but
it is by no means obvious that it is the only rational way.
Learning
Machine Learning Paradigms
What is learning?
“Learning denotes changes in a system that ... enable a
system to do the same task more efficiently the next time.”
–Herbert Simon
“Learning is constructing or modifying representations of
what is being experienced.”
–Ryszard Michalski
“Learning is making useful changes in our minds.” –
Marvin Minsky
3
Paradigms in Machine Learning
A paradigm, as most of us know, is a set of ideas,
assumptions and values held by an entity and they shape
the way that entity interacts with their environment.
For machine learning, this translates into the set of policies
and assumptions inherited by a machine learning algorithm
which dictate how it interacts with both the data inputs and
the user.
Machine Learning
Machine Learning is said as a subset of artificial
intelligence that is mainly concerned with the development
of algorithms which allow a computer to learn from the
data and past experiences on their own. The term machine
learning was first introduced by Arthur Samuel in 1959.
We can define it in a summarized way as:
Machine learning enables a machine to automatically
learn from data, improve performance from
experiences, and predict things without being explicitly
programmed.
Machine Learning
With the help of sample historical data, which is known
as training data, machine learning algorithms build
a mathematical model that helps in making predictions or
decisions without being explicitly programmed. Machine
learning brings computer science and statistics together for
creating predictive models. Machine learning constructs or
uses the algorithms that learn from historical data. The
more we will provide the information, the higher will be the
performance.
A machine has the ability to learn if it can improve its
performance by gaining more data.
How does Machine Learning work
A Machine Learning system learns from historical data,
builds the prediction models, and whenever it receives
new data, predicts the output for it. The accuracy of
predicted output depends upon the amount of data, as the
huge amount of data helps to build a better model which
predicts the output more accurately.
Classification of Machine Learning
At a broad level, machine learning can be classified into three
types:
Supervised learning
Unsupervised learning
Reinforcement learning
Supervised Learning
Supervised learning is a type of machine learning method in
which we provide sample labeled data to the machine learning
system in order to train it, and on that basis, it predicts the
output.
The system creates a model using labeled data to understand
the datasets and learn about each data, once the training and
processing are done then we test the model by providing a
sample data to check whether it is predicting the exact output
or not.
The goal of supervised learning is to map input data with the
output data. The supervised learning is based on supervision,
and it is the same as when a student learns things in the
supervision of the teacher. The example of supervised learning
is spam filtering.
Supervised Learning
Supervised learning can be grouped further in two categories of
algorithms:
Classification
Regression
Classification:
Classification is a process of categorizing a given set of data
into classes, It can be performed on both structured or
unstructured data. The process starts with predicting the class
of given data points. The classes are often referred to as target,
label or categories.
Classification
The classification predictive modeling is the task of
approximating the mapping function from input variables to
discrete output variables. The main goal is to identify which
class/category the new data will fall into.
Heart disease detection can be identified as a classification
problem, this is a binary classification since there can be
only two classes i.e has heart disease or does not have heart
disease. The classifier, in this case, needs training data to
understand how the given input variables are related to the
class. And once the classifier is trained accurately, it can be
used to detect whether heart disease is there or not for a
particular patient.
Classification
Since classification is a type of supervised learning, even
the targets are also provided with the input data. Let us get
familiar with the classification in machine learning
terminologies.
Examples of supervised machine learning algorithms for
classification are:
Decision Tree Classifiers
Support Vector Machines
Naive Bayes Classifiers
K Nearest Neighbor
Artificial Neural Networks
Regression
The regression algorithms attempt to estimate the mapping
function (f) from the input variables (x) to numerical or
continuous output variables (y). Now, the output variable could
be a real value, which can be an integer or a floating point
value. Therefore, the regression prediction problems are
usually quantities or sizes.
For example, if you are provided with a dataset about houses,
and you are asked to predict their prices, that is a regression
task because the price will be a continuous output.
Examples of supervised machine learning algorithms for
regression:
Linear Regression
Logistic Regression
Regression Decision Trees
Artificial Neural Networks
Unsupervised Learning
As the name suggests, unsupervised learning is a machine
learning technique in which models are not supervised
using training dataset. Instead, models itself find the hidden
patterns and insights from the given data. It can be
compared to learning which takes place in the human brain
while learning new things. It can be defined as:
Unsupervised learning is a type of machine learning in
which models are trained using unlabeled dataset and are
allowed to act on that data without any supervision.
Unsupervised learning cannot be directly applied to a
regression or classification problem because unlike
supervised learning, we have the input data but no
corresponding output data.
Unsupervised Learning
The goal of unsupervised learning is to find the
underlying structure of dataset, group that data
according to similarities, and represent that dataset in a
compressed format.
Example: Suppose the unsupervised learning algorithm is
given an input dataset containing images of different types
of cats and dogs. The algorithm is never trained upon the
given dataset, which means it does not have any idea about
the features of the dataset. The task of the unsupervised
learning algorithm is to identify the image features on their
own. Unsupervised learning algorithm will perform this
task by clustering the image dataset into the groups
according to similarities between images.
Unsupervised Learning
Subset 2:
s.no place type weather location decision
Rule Set
Gain(S,Temperature) = 0.94-(4/14)(1.0)
-(6/14)(0.9183)
-(4/14)(0.8113)
=0.0289
Tempe Humidit Play
Day Outlook
rature y
Wind
Golf Attribute : Humidity
D1 Sunny Hot High Weak No
Values(Humidity) = High, Normal
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Tempe Humidit Play
Day Outlook Wind
rature y Golf Attribute : Wind
D1 Sunny Hot High Weak No Values(Wind) = Strong, Weak
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
We calculating information gain for all attributes:
Gain(S,Outlook)= 0.2464,
Gain(S,Temperature)= 0.0289
Gain(S,Humidity)=0.1516
Gain(S,Wind) =0.0478
We can clearly see that IG(S, Outlook) has the highest
information gain of 0.246, hence we chose Outlook attribute as
the root node. At this point, the decision tree looks like.
Here we observe that whenever the outlook is Overcast,
Play Golf is always ‘Yes’, it’s no coincidence by any
chance, the simple tree resulted because of the highest
information gain is given by the attribute Outlook.
Now how do we proceed from this point? We can simply
apply recursion, you might want to look at the algorithm
steps described earlier.
Now that we’ve used Outlook, we’ve got three of them
remaining Humidity, Temperature, and Wind. And, we had
three possible values of Outlook: Sunny, Overcast, Rain.
Where the Overcast node already ended up having leaf
node ‘Yes’, so we’re left with two subtrees to compute:
Sunny and Rain.
Attribute : Temperature
Values(Temperature) = Hot, Mild, Cool
Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Attribute : Humidity
Values(Humidity) = High, Normal
Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Attribute : Wind
Values(Wind) = Strong, Weak
Temp
e Humidit Play
Day Wind
ratur y Golf
e
D1 Hot High Weak No
D2 Hot High Strong No
D8 Mild High Weak No
D9 Cool Normal Weak Yes
D11 Mild Normal Strong Yes
Gain(Ssunny,Temperature)= 0.570
Gain(Ssunny,Humidity)=0.97
Gain(Ssunny,Wind) =0.0192
Attribute : Temperature
Values(Temperature) = Hot, Mild, Cool
Gain(Srain,Humidity)=0.0192
Gain(Srain,Wind) =0.97
Decision Tree
Neural Networks
● Artificial neural network (ANN) is a machine learning
approach that models human brain and consists of a number
of artificial neurons.
● Neuron in ANNs tend to have fewer connections than
biological neurons.
● Each neuron in ANN receives a number of inputs.
● An activation function is applied to these inputs which
results in activation level of neuron (output value of the
neuron).
● Knowledge about the learning task is given in the form of
examples called training examples.
Contd..
y = (u + b)
The Neuron Diagram
Bias
b
x1 w1
Activation
Induced function
Field
(−)
Output
v
x2 w2 y
Input
values
Summing
function
xm wm
weights
Bias of a Neuron
w0 = b
Neural Networks Activation
Functions
Activation functions are mathematical equations that determine
the output of a neural network.
The function is attached to each neuron in the network, and
determines whether it should be activated (“fired”) or not, based
on whether each neuron’s input is relevant for the model’s
prediction.
Activation functions also help normalize the output of each
neuron to a range between 1 and 0 or between -1 and 1.
An additional aspect of activation functions is that they must be
computationally efficient because they are calculated across
thousands or even millions of neurons for each data sample.
Contd..
Modern neural networks use a technique called
backpropagation to train the model, which places an
increased computational strain on the activation function,
and its derivative function.
It’s just a thing function that you use to get the output of
node. It is also known as Transfer Function.
It is used to determine the output of neural network like yes
or no. It maps the resulting values in between 0 to 1 or -1 to
1 etc. (depending upon the function).
Step Function
A step function is a function like that used by the original
Perceptron.
The output is a certain value, A1, if the input sum is above a
certain threshold and A0 if the input sum is below a certain
threshold.
The values used by the Perceptron were A1 = 1 and A0 = 0.
Step Function
Linear or Identity Activation
Function
As you can see the function is a line or linear. Therefore,
the output of the functions will not be confined between
any range.
Equation : f(x) = x
Range : (-infinity to infinity)
It doesn’t help with the complexity or various parameters of
usual data that is fed to the neural networks.
Sigmoid or Logistic Activation
Function
The sigmoid function is an activation function where it
scales the values between 0 and 1 by applying a threshold.
The above equation represents a sigmoid function. When
we apply the weighted sum in the place of X, the values are
scaled in between 0 and 1.
The beauty of an exponent is that the value never reaches
zero nor exceed 1 in the above equation.
The large negative numbers are scaled towards 0 and large
positive numbers are scaled towards 1.
Sigmoid Function
Tanh or Hyperbolic Tangent
Function
The Tanh function is an activation function which re scales
the values between -1 and 1 by applying a threshold just
like a sigmoid function.
The advantage i.e the values of a tanh is zero centered
which helps the next neuron during propagating.
When we apply the weighted sum of the inputs in the
tanh(x), it re scales the values between -1 and 1. .
The large negative numbers are scaled towards -1 and large
positive numbers are scaled towards 1.
ReLU(Rectified Linear Unit) :
This is one of the most widely used activation function.
The benefits of ReLU is the sparsity, it allows only values
which are positive and negative values are not passed
which will speed up the process and it will negate or bring
down possibility of occurrence of a dead neuron.
f(x) = (0,max)
This function will allow only the maximum values to pass
during the front propagation .
The draw backs of ReLU is when the gradient hits zero for
the negative values, it does not converge towards the
minima which will result in a dead neuron while back
propagation.
Network Architectures
● Three different classes of network architectures
− single-layer feed-forward
− multi-layer feed-forward
− recurrent
Input Output
layer layer
Hidden Layer
3-4-2 Network
FFNN for XOR
FFNN for XOR
● The ANN for XOR has two hidden nodes that realizes this non-linear
separation and uses the sign (step) activation function.
● Arrows from input nodes to two hidden nodes indicate the directions of
the weight vectors.
● The output node is used to combine the outputs of the two hidden nodes.
Inputs Output of Hidden Nodes Output Node X1 XOR X2
X1 X2 H1 (OR) H2 (NAND) AND (H1 , H2)
0 0 0 1 0 0
0 1 1 1 1 1
1 0 1 1 1 1
1 1 1 0 0 0
Network activation
Forward Step
Error propagation
Backward Step
The agent learns by trying all the possible paths and then choosing the path which
gives him the reward with the least hurdles. Each right step will give the agent a
reward and each wrong step will subtract the reward of the agent. The total reward
will be calculated when it reaches the final reward that is at the state where an agent
gets +1 reward
Steps in Reinforcement Learning
Input: The input should be an initial state from which the
model will start
Output: There are many possible output as there are variety
of solution to a particular problem
Training: The training is based upon the input, The model
will return a state and the user will decide to reward or
punish the model based on its output.
The model keeps continues to learn.
The best solution is decided based on the maximum reward.
Policy: It is a mapping of an action to every
possible state in the system (sequence of
states).
Optimal Policy: A policy which maximizes
the long term reward.
Active and Passive Reinforcement
Learning
Both active and passive reinforcement learning are types of
Reinforcement Learning.
In case of passive reinforcement learning, the agent’s policy
is fixed which means that it is told what to do.
In contrast to this, in active reinforcement learning, an
agent needs to decide what to do as there’s no fixed policy
that it can act on.
Therefore, the goal of a passive reinforcement learning
agent is to execute a fixed policy (sequence of actions) and
evaluate it while that of an active reinforcement learning
agent is to act and learn an optimal policy.
Passive Reinforcement Learning
Techniques
In this kind of RL, agent assume that the agent’s policy
π(s) is fixed.
Agent is therefore bound to do what the policy dictates,
although the outcomes of agent actions are probabilistic.
The agent may watch what is happening, so the agent
knows what states the agent is reaching and what rewards
the agent gets there.
Techniques:
1. Direct utility estimation
2. Adaptive dynamic programming
3. Temporal difference learning
Active Reinforcement Learning
Techniques
In this kind of RL agent , it assume that the agent’s policy
π(s) is not fixed.
Agent is therefore not bound on existing policy and tries to
act and find an Optimal policy for calculating and
maximizing the overall reward value.
Techniques:
1. Q-Learning
2. ADP with exploration function
Applications of Reinforcement
Learning
Robotics for industrial automation.
Business strategy planning
Machine learning and data processing
It helps you to create training systems that provide custom
instruction and materials according to the requirement of
students.
Aircraft control and robot motion control