0% found this document useful (0 votes)
3 views2 pages

MIT Information Retrieval_Question

The document discusses a spam filter using a Naïve Bayes classifier to evaluate the probability of an email being spam based on the presence of the word 'discount.' It calculates the total probability of the word 'discount' and then applies Bayes' Theorem to find that the probability of an email being spam given it contains 'discount' is 56.25%. The calculations show that 32% of all emails contain 'discount,' leading to the final probability result.

Uploaded by

Mohamed Sido
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views2 pages

MIT Information Retrieval_Question

The document discusses a spam filter using a Naïve Bayes classifier to evaluate the probability of an email being spam based on the presence of the word 'discount.' It calculates the total probability of the word 'discount' and then applies Bayes' Theorem to find that the probability of an email being spam given it contains 'discount' is 56.25%. The calculations show that 32% of all emails contain 'discount,' leading to the final probability result.

Uploaded by

Mohamed Sido
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

MIT Information Retrieval: Question

Name: MOHAMMED ABUSIDU


University: President University
Course: Information Retrieval
Instructor: HASANUL FAHMI

Question:

A spam filter uses a Naïve Bayes classifier to determine whether an email is spam based on the presence
of certain words. Suppose we have the following probabilities:

●​ P(Spam) = 0.3 (30% of all emails are spam)


●​ P(Not Spam) = 0.7 (70% of all emails are not spam)
●​ P("discount" | Spam) = 0.6 (60% of spam emails contain the word “discount”)
●​ P("discount" | Not Spam) = 0.2 (20% of non-spam emails contain the word “discount”)

If an email contains the word "discount," what is the probability that it is spam?

Answer:-

●​ Compute P(“discount”) (Total Probability of the Word "Discount")

P(“discount”) = P("discount"∣ Spam) * P(Spam)+P("discount"∣ Not Spam) * P(Not Spam)

​ ​ = (0.6×0.3) + (0.2×0.7)

​ ​ = 0.18 + 0.14= 0.32

This means that 32% of all emails contain the word "discount".

●​ Compute P(Spam ∣ "discount") Using Bayes' Theorem

𝑃("𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡"∣𝑆𝑝𝑎𝑚)𝑃(𝑆𝑝𝑎𝑚)​
P(Spam∣"discount") =
𝑃(“𝑑𝑖𝑠𝑐𝑜𝑢𝑛𝑡”)
(0.6×0.3) 0.18
​ = 0.32
= 0.32
= 0. 5625​

The probability that an email is spam given that it contains the word "discount" is:

P(Spam∣"discount") = 0.5625 or 56.25%

Thank you.

You might also like