Notes On ML
Notes On ML
May 14, 2024 Vel Tech Rangarajan Dr. Sagunthala R & D Institute of Science and Technology 1
PROBABILITY THEORY
For example, when we toss a coin, either we get Head OR Tail, only
two possible outcomes are possible (H, T). But if we toss two coins in
both the coins show heads or both show tails or one shows heads and
P(A) = n(A)/n(S)
May 14, 2024 Vel Tech Rangarajan Dr. Sagunthala R & D Institute of Science and Technology 3
PROBABILITY FORMULA
May 14, 2024 Vel Tech Rangarajan Dr. Sagunthala R & D Institute of Science and Technology 4
BAYES RULE
Which tells us: how often A happens given that B happens, written P(A|B),
When we know: how often B happens given that A happens, written P(B|A)
Let us say P(Fire) means how often there is fire, and P(Smoke) means
how often we see smoke, then:
P(Fire|Smoke) means how often there is fire when we can see smoke
P(Smoke|Fire) means how often we can see smoke when there is fire
May 14, 2024 Vel Tech Rangarajan Dr. Sagunthala R & D Institute of Science and Technology 6
BAYES RULE
Example - 1:
Imagine 100 people at a party, and you tally how many wear pink or
not, and if a man or not, and get these numbers:
May 14, 2024 Vel Tech Rangarajan Dr. Sagunthala R & D Institute of Science and Technology 7
BAYES RULE
May 14, 2024 Vel Tech Rangarajan Dr. Sagunthala R & D Institute of Science and Technology 8
BAYES RULE
May 14, 2024 Vel Tech Rangarajan Dr. Sagunthala R & D Institute of Science and Technology 9
BAYES RULE
P(Man) = 0.4,
P(Pink) = 0.25 and
P(Pink|Man) = 0.125
if we still had the raw data we could calculate directly 5/25 = 0.2
May 14, 2024 Vel Tech Rangarajan Dr. Sagunthala R & D Institute of Science and Technology 10
INDEPENDENCE
that P(A)=1/3. Also suppose I toss a fair coin; let B be the event that
What is P(A|B)?
May 14, 2024 Vel Tech Rangarajan Dr. Sagunthala R & D Institute of Science and Technology 11
INDEPENDENCE
two independent events. Two events are independent if one does not
May 14, 2024 Vel Tech Rangarajan Dr. Sagunthala R & D Institute of Science and Technology 12
INDEPENDENCE
=P(A)P(B)/P(B)
=P(A)
Thus, if two events A and B are independent and P(B)≠0, then P(A|
B)=P(A). To summarize, we can say "independence means we can
multiply the probabilities of events to obtain the probability of their
intersection", or equivalently, "independence means that conditional
probability of one event given another is the same as the original
(prior) probability".
CONDITIONAL INDEPENDENCE
𝑃 ( 𝐴|B,C¿=𝑃 ( 𝐴 Ո𝐵|C ¿ ¿ ¿
𝑃 ( 𝐵|C¿
CONDITIONAL INDEPENDENCE
𝑃 ( 𝐴|B,C¿=𝑃 ( 𝐴 Ո𝐵|C ¿ ¿ ¿
𝑃 ( 𝐵|C¿
¿ 𝑃 ( 𝐴|𝐶¿ 𝑃 ( 𝐵|𝐶 ¿ ¿ ¿
𝑃 ( 𝐵|𝐶 ¿
𝑃 ( 𝐴| B , C ¿=𝑃 ( 𝐴|𝐶 ¿
A box contains two coins: a regular coin and one fake two-headed
coin (P(H)=1). I choose a coin at random and toss it twice. Define the
following events.
Probability distribution
Conditional probability
Joint probability
Given two random variables that are defined on the same probability
space
A
Suppose that you are allowed to flip the coin 10 times in order to
determine the fairness of the coin. Your observations from the
experiment will fall under one of the following cases:
If case 1 is observed, you are now more certain that the coin is a fair
coin, and you will decide that the probability of observing heads
is 0.5 with more confidence.
2.Adjust your belief accordingly to the value of h that you have just
observed, and decide the probability of observing heads using your
recent observations.
A
The first method suggests that we use the frequentist method, where
we omit our beliefs when making decisions. However, the second
method seems to be more convenient because 10 times tossing a coin
are insufficient to determine the fairness of a coin. Therefore, we can
make better decisions by combining our recent observations and
beliefs that we have gained through our past experiences.
Here’s the Bayes theorem, which is the basis for this algorithm:
In this equation, ‘A’ stands for class, and ‘B’ stands for attributes.
P(A/B) stands for the posterior probability of class according to the
predictor. P(B) is the prior probability of the predictor, and P(A) is the
prior probability of the class. P(B/A) shows the probability of the
predictor according to the class.
A
according to its features. Each row has individual entries, and the
columns represent the features of every car. In the first row, we have a
stolen Red Sports Car with Domestic Origin. We’ll find out if thieves
would steal a Red Domestic SUV or not (our dataset doesn’t have an
Here, y stands for the class variable (Was it Stolen?) to show if the
thieves stole the car not according to the conditions. X stands for the
features.
Now, we’ll replace X and expand the chain rule to get the following:
You can get the values for each by using the dataset and putting their
values in the equation. The denominator will remain static for every
entry in the dataset to remove it and inject proportionality.
y = argmaxyP(y) i = 1nP(xi | y)
A
=⅗x⅕x⅖x1
= 0.048
A
=⅖x⅗x⅗x1
= 0.144