0% found this document useful (0 votes)
21 views20 pages

Unit 2

Uploaded by

Mohamed riyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views20 pages

Unit 2

Uploaded by

Mohamed riyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 20

UNIT-2

Probability theory

Probability theory is a branch of mathematics that deals with the analysis of random
phenomena. It aims to assign numerical values to the likelihood of events occurring. The
main concepts and ideas in probability theory include:

1. Sample Space: The set of all possible outcomes of an experiment.


2. Event: A subset of the sample space.
3. Probability: A measure of the likelihood of an event occurring, represented by a number
between 0 and 1.

There are two primary methods to calculate the probability of an event:

1. Theoretical Probability: Calculated by dividing the number of favorable outcomes by the


total number of possible outcomes.
2. Experimental Probability: Calculated by conducting an experiment and observing the
relative frequency of an event occurring.

Complementary Probability is the probability that an event will not occur, calculated by
subtracting the probability of the event from 1.

Several important rules in probability theory help in calculating the probabilities of


compound events:

1. Addition Rule (Union): P(A ∪ B) = P(A) + P(B) - P(A ∩ B), where A and B are events.
2. Multiplication Rule (Intersection): P(A ∩ B) = P(A) * P(B|A), where A and B are events, and
P(B|A) is the probability of B given A has occurred.
3. Independence: Two events A and B are independent if P(B|A) = P(B).
4. Conditional Probability: P(A|B) is the probability of A occurring given that B has occurred.
Random variables are functions that assign numerical values to the outcomes of a random
experiment. Probability distributions describe the likelihood of different values that a
random variable can take. Some common probability distributions include:

1. Discrete Distributions: Bernoulli, Binomial, Geometric, Poisson, and Uniform distributions.


2. Continuous Distributions: Uniform, Exponential, Normal (Gaussian), and others.

Probability Density Functions (PDFs) and Cumulative Distribution Functions (CDFs) are used
to describe continuous probability distributions:

1. PDFs provide the probability density at any given point.


2. CDFs give the probability of a random variable being less than or equal to a certain value.

Finally, Expectation (or the mean) and Variance are essential concepts for understanding the
behavior of random variables and their distributions. Expectation is a measure of the central
tendency, while Variance is a measure of the dispersion or spread of a distribution.

Bayes rule:
Bayes' Theorem: Bayes' Theorem, named after the Reverend Thomas Bayes, is a
fundamental concept in probability theory that allows us to reverse the conditional
probability relationship between two events. It is particularly useful in situations where we
have prior knowledge or information about one event and want to update our beliefs when
new evidence becomes available.

Bayes' Theorem is given by:

P(A|B) = (P(B|A) * P(A)) / P(B)

Here,

- P(A|B) is the probability of event A occurring given that event B has occurred.
- P(B|A) is the probability of event B occurring given that event A has occurred.
- P(A) is the prior probability of event A, which is the probability of A occurring without
considering B.
- P(B) is the probability of event B occurring, also known as the marginal probability of B.

By using Bayes' Theorem, we can calculate the conditional probability P(A|B), which might
be difficult or impossible to determine directly. This is particularly useful when we want to
update our beliefs about the probability of an event based on new evidence or information.

Bayes' Theorem plays a crucial role in various fields, including statistics, machine learning,
data science, and scientific inquiry. Some common applications include:

- Medical diagnosis: Updating the probability of a disease given a patient's symptoms.


- Hypothesis testing: Updating the probability of a hypothesis being true based on observed
data.
- Spam filtering: Updating the probability of an email being spam based on its content and
sender characteristics.
- Genetic testing: Updating the probability of a person having a genetic disorder based on
family history and genetic markers.

Concept learning is a fundamental concept in machine learning and artificial


intelligence that involves the process of learning to classify objects or examples based on
their features or attributes. The goal of concept learning is to identify and generalize
patterns from data to make predictions or decisions about new, unseen instances.

Here are the key components and steps involved in concept learning:

Examples: Concept learning begins with a set of examples or instances that are labeled with
their corresponding classes or categories. These examples serve as the training data for the
learning algorithm.

Features or Attributes: Each example is described by a set of features or attributes that


capture relevant information about the object or instance. These features could be
numerical, categorical, or symbolic.
Hypothesis Space: The hypothesis space represents the set of possible concepts or
classifiers that the learning algorithm can consider. It defines the range of hypotheses that
the algorithm will explore to find the best concept that fits the data.

Training Algorithm: The training algorithm is used to search the hypothesis space and find a
concept that best fits the training data. This involves evaluating and comparing different
hypotheses based on how well they explain the examples.

Generalization: Once a concept is learned from the training data, the goal is to generalize
this concept to new, unseen examples. Generalization ensures that the learned concept can
accurately classify instances that were not part of the training set.

Evaluation: The learned concept is evaluated using performance metrics such as accuracy,
precision, recall, and F1 score to assess its effectiveness in classifying new instances.

Concept learning can be supervised, unsupervised, or semi-supervised, depending on the


availability of labeled data during the learning process. Supervised concept learning involves
learning from labeled examples, while unsupervised learning involves discovering patterns
and structures in unlabeled data. Semi-supervised learning combines both labeled and
unlabeled data for learning concepts.

Bayes' Theorem:
Bayes' theorem states that the probability of a hypothesis (class) given the evidence
(features) is proportional to the probability of the evidence given the hypothesis, multiplied
by the prior probability of the hypothesis, and divided by the probability of the evidence.

Mathematically, it is represented as:


EXAMPLE
Suppose we have a dataset of weather conditions and corresponding target
variable "Play". So using this dataset we need to decide that whether we
should play or not on a particular day according to the weather conditions. So
to solve this problem, we need to follow the below steps:
1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Problem: If the weather is sunny, then the Player should play or not?
Solution: To solve this, first consider the below dataset:
Applying Bayes'theorem:
P(Yes|Sunny)= P(Sunny|Yes)*P(Yes)/P(Sunny)
P(Sunny|Yes)= 3/10= 0.3
P(Sunny)= 0.35
P(Yes)=0.71
So P(Yes|Sunny) = 0.3*0.71/0.35= 0.60
P(No|Sunny)= P(Sunny|No)*P(No)/P(Sunny)
P(Sunny|NO)= 2/4=0.5
P(No)= 0.29
P(Sunny)= 0.35
So P(No|Sunny)= 0.5*0.29/0.35 = 0.41
Naive Independence Assumption: The Naive Bayes algorithm assumes that all features are
conditionally independent given the class label. This means that the presence of a particular
feature in an instance is independent of the presence of other features, given the class label.

The Naive Bayes algorithm is computationally efficient, especially for high-dimensional data,
and can perform well even with relatively small training datasets. However, its assumption of
feature independence may not hold true in all cases, leading to potential inaccuracies,
especially when features are correlated.

Bayesian belief Networks:

It is a graphical representation of different probabilistic relationships among


random variables in a particular set. It is a classifier with no dependency on
attributes i.e it is condition independent. Due to its feature of joint probability, the
probability in Bayesian Belief Network is derived, based on a condition —
P(attribute/parent) i.e probability of an attribute, true over parent attribute.
Consider this example:
In the above figure, we have an alarm ‘A’ – a node, say installed in a house of a person ‘gfg’, which
rings upon two probabilities i.e burglary ‘B’ and fire ‘F’, which are – parent nodes of the alarm node.
The alarm is the parent node of two probabilities P1 calls ‘P1’ & P2 calls ‘P2’ person nodes. • Upon
the instance of burglary and fire, ‘P1’ and ‘P2’ call person ‘gfg’, respectively. But, there are few
drawbacks in this case, as sometimes ‘P1’ may forget to call the person ‘gfg’, even after hearing the
alarm, as he has a tendency to forget things, quick. Similarly, ‘P2’, sometimes fails to call the person
‘gfg’, as he is only able to hear the alarm, from a certain distance.

Expectation-Maximization Algorithm In the real-world applications of machine learning, it is very


common that there are many relevant features available for learning but only a small subset of them
are observable. So, for the variables which are sometimes observable and sometimes not, then we
can use the instances when that variable is visible is observed for the purpose of learning and then
predict its value in the instances when it is not observable.

On the other hand, Expectation-Maximization algorithm can be used for the latent variables
(variables that are not directly observable and are actually inferred from the values of the other
observed variables) too in order to predict their values with the condition that the general form of
probability distribution governing those latent variables is known to us. This algorithm is actually at
the base of many unsupervised clustering algorithms in the field of machine learning. It was
explained, proposed and given its name in a paper published in 1977 by Arthur Dempster, Nan Laird,
and Donald Rubin. It is used to find the local maximum likelihood parameters of a statistical model in
the cases where latent variables are involved and the data is missing or incomplete.

Algorithm:
1. Given a set of incomplete data, consider a set of starting parameters.
2. Expectation step (E – step): Using the observed available data of the
dataset, estimate (guess) the values of the missing data.
3. Maximization step (M – step): Complete data generated after the
expectation (E) step is used in order to update the parameters.
4. Repeat step 2 and step 3 until convergence.
The essence of Expectation-Maximization algorithm is to use the available
observed data of the dataset to estimate the missing data and then using that
data to update the values of the parameters. Let us understand the EM
algorithm in detail.
• Initially, a set of initial values of the parameters are considered. A set of
incomplete observed data is given to the system with the assumption that the
observed data comes from a specific model.
• The next step is known as “Expectation” – step or E-step. In this step, we use
the observed data in order to estimate or guess the values of the missing or
incomplete data. It is basically used to update the variables.
• The next step is known as “Maximization”-step or M-step. In this step, we
use the complete data generated in the preceding “Expectation” – step in
order to update the values of the parameters. It is basically used to update the
hypothesis.
• Now, in the fourth step, it is checked whether the values are converging or
not, if yes, then stop otherwise repeat step-2 and step-3 i.e. “Expectation” –
step and
“Maximization” – step until the convergence occurs.
Flow chart for EM algorithm
Usage of EM algorithm
• It can be used to fill the missing data in a sample.
• It can be used as the basis of unsupervised learning of clusters.
• It can be used for the purpose of estimating the parameters of Hidden Markov
Model (HMM).
• It can be used for discovering the values of latent variables.
Advantages of EM algorithm
• It is always guaranteed that likelihood will increase with each iteration.
• The E-step and M-step are often pretty easy for many problems in terms of
implementation.
• Solutions to the M-steps often exist in the closed form.
Usage of EM algorithm
• It can be used to fill the missing data in a sample.
• It can be used as the basis of unsupervised learning of clusters.
• It can be used for the purpose of estimating the parameters of Hidden Markov
Model (HMM).
• It can be used for discovering the values of latent variables.
Advantages of EM algorithm
• It is always guaranteed that likelihood will increase with each iteration.
• The E-step and M-step are often pretty easy for many problems in terms of
implementation.
• Solutions to the M-steps often exist in the closed form.

You might also like