0% found this document useful (0 votes)

109 views

A Case Study With Conditional Probability - Kaggle

This document provides an overview of conditional probability and uses a dataset of monthly rainfall data from Kerala, India from 1901-2018 to demonstrate calculating conditional probabilities. It defines key probability concepts like events, sample space, union and intersection of events. It then loads and preprocesses the rainfall dataset, which will be used to calculate the probability of flooding in a given year given that it rained more than a certain threshold in June or July.

Uploaded by

Anshul Singh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views

A Case Study With Conditional Probability - Kaggle

Uploaded by

Anshul Singh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

8/13/2021 Notebook

https://www.kaggle.com/lakshya91/a-case-study-with-conditional-probability 1/11
link code
8/13/2021 Notebook

Overview

In this post, I'll give a gentle introduction to the conditional probability with the help of a real life
example. Then, I'll try to extend this idea of calculating conditional probabilities to Bayes Theorem. I
know there already exist excellent resources on this but this post is no way a replacement to those, it's
just a supplement to see how we can use those concepts to some real-life data.

In probability theory, conditional probability is the probability of occurring of an event where it is given
that another event has already occurred. To understand it a little better, we first need to set up the
stage by defining a few terms from set theory.

Events and Sample Space

In the simplest terms, an event is just the result of a random experiment. For example, getting a head
when we toss a coin is one event, drawing a ball at random from a bag containing 3 black and 5 red
balls is also an event. As we can see, we can easily associate the concept of probability to the events.

A collection of all possible outcomes of an event is called a sample space, for tossing the coin we can
have just two outcomes: head (H) or a tail(T). Similarly, rolling a fair die will always result in some
number between 1 to 6, hence the sample space is {1, 2, 3, 4, 5, 6}.

Union of events

Consider again the rolling of a fair die where we define two events:

Event A: Getting a number which is divisible by 2

Event B: Getting a number which is divisible by 3
The sample space for event A is {2, 4, 6} whereas
for event B it is {3, 6}. Now if we define another event C which is getting a number which is divisible
by either 2 OR 3 our new sample space would just be the combination of all the unique elements of
sample space A and sample space B: {2, 3, 4, 6}. Mathematically, these events can be shown in
terms of venn diagrams:

https://www.kaggle.com/lakshya91/a-case-study-with-conditional-probability 2/11
8/13/2021 Notebook

(https://imgur.com/7o5B3u9)

In terms of probabilities, we can easily calculate the probabilities for all the events as follows:

Number of cases where the output is divisible by 2

P(A) = Total possible outcomes
= 3/6 = 0.5
Number of cases where the output is divisible by 3
P(B) = = 2/6 = 0.333
Total possible outcomes

Number of cases where the output is divisible by either 2 OR 3

P(C) = P(A ∪ B) = Total possible outcomes
= 4/6 = 0.667

Intersection of events

Following the events defined previously, we can also define an event D which is getting a number which
is divisible by both 2 AND 3, meaning the common element in the sample space of both the events. In
terms of venn diagrams, it can be shown as:

(https://imgur.com/dwBjCji)

Again, in terms of probabilities:

https://www.kaggle.com/lakshya91/a-case-study-with-conditional-probability 3/11
8/13/2021 Notebook
Number of cases where the output is divisible by 2
P(A) = Total possible outcomes
= 3/6 = 0.5
Number of cases where the output is divisible by 3
P(B) = Total possible outcomes
= 2/6 = 0.333
Number of cases where the output is divisible by both 2 AND 3
P(C) = P(A ∩ B) = Total possible outcomes
= 1/6 = 0.167

Disjoint events

Consider these two events:

Event A: Getting a number which is divisible by 2

Event B: Getting a number which is divisible by 5
The sample space for event A is {2, 4, 6} whereas
for event B it is {5}. We can see that these events cannot occur together in the case of rolling a fair
die. These events are called disjoint events. The venn diagram for these type of events can be
shown as:

(https://imgur.com/KqXin9c)

Dependent and Independent events

If the occurrence of one event does not effect the occurrence of another event,then these events are
termed as independent events. Few examples of independent events include:

Getting a head when a coin is tossed AND getting a 5 in rolling a fair die
Getting rains in the month of August AND snow in December

The probability in the case of independent events can be written as P(A ∩ B) = P(A) * P(B) , that
is, the probability of occurring both the events is just the product of individual probabilities. Let's try to
understand this more concretely, suppose we have a bag containing 3 blue and 5 green balls and we
draw two balls at random with replacement (meaning putting back the first ball in the bag after first
trial). We define the two events as:

Event A: Getting a blue ball

Event B: Getting a green ball

https://www.kaggle.com/lakshya91/a-case-study-with-conditional-probability 4/11
8/13/2021 Notebook

We are interested in calculating the probability of getting a blue ball in the first trial AND a green ball in
the second. Writing the probabilities for both events:
P(A) = 3/8 ; P(B) = 5/8 . Since getting a blue
ball in a trial is independent of getting the green ball, this is the case of independent events. So we can
write:

P(C) = P(A ∩ B) = P(A) P(B) = P(B) P(A) = 15 / 64

Now for the case of dependent events, we can use the same example with one big difference: we are
not going to replace the drawn ball in the first trial! In this case, P(A) = 3/8 and P(B) = 5/7 .
Notice the denominator of P(B) : since the drawn ball is not replaced, now we are sampling the second
draw out of a smaller number of sample space. This would also ensure a greater chance of getting the
green ball in the second draw.

This discussion of dependent events naturally extends to the idea of conditional probability: We try to
calculate the probability of an event A given another event B has already happened. It is denoted by
P(A|B) . To get a feel for it, let's see some examples based on it

Probability of drawing a diamond from a deck of well-shuffled cards given the drawn card is red
Probability of rain on a given day of the month given it is July
We can easily infer from above two
examples that both the events in the examples are dependent of each other.

Conditional probability can be defined as follows:

P (A∩B)
Probability of event A given event B has already occurred = P(A|B) =
P (B)

we can easily see the above equation reduces to P(A) for independent events by writing P(A ∩ B) =
P(A)*P(B) .

Now in my attempt to make this post a little less boring, let's make our hands dirty and use python to
understand the concept of conditional probability

In [1]:

# Import libraries

import numpy as np

import pandas as pd

About the Dataset

The dataset contains the monthly rainfall data from years 1901 to 2018 for the Indian state of Kerala.
Kerala is one of the few states which are usually badly hit by monsoons every year. You can read more
about it in this excellent kernel (https://www.kaggle.com/biphili/india-rainfall-kerala-flood).

https://www.kaggle.com/lakshya91/a-case-study-with-conditional-probability 5/11
8/13/2021 Notebook

In [2]:

# Read the data

df = pd.read_csv("/kaggle/input/kerela-flood/kerala.csv")

df.head()

Out[2]:

SUBDIVISION YEAR JAN FEB MAR APR MAY JUN JUL AUG SE

0 KERALA 1901 28.7 44.7 51.6 160.0 174.7 824.6 743.0 357.5 19

1 KERALA 1902 6.7 2.6 57.3 83.9 134.5 390.9 1205.0 315.8 49
2 KERALA 1903 3.2 18.6 3.1 83.6 249.7 558.6 1022.5 420.2 34

3 KERALA 1904 23.7 3.0 32.2 71.5 235.7 1098.2 725.5 351.8 22
4 KERALA 1905 1.2 22.3 9.4 105.9 263.3 850.2 520.5 293.6 21

In [3]:

# Changing the target column to numeric values

df["FLOODS"] = df["FLOODS"].map({"YES": 1, "NO": 0})

We will be needing only columns JUN , JUL , YEAR and FLOODS since we are interested in
calculating the probability of flooding in that year given it rained more than a certain threshold (500
mm) in these months. We will create a couple more columns based on these columns.

https://www.kaggle.com/lakshya91/a-case-study-with-conditional-probability 6/11
8/13/2021 Notebook

In [4]:

# Creating binary data for the months of June and July using the rainfall threshol
d

df["JUN_GT_500"] = (df["JUN"] > 500).astype("int")

df["JUL_GT_500"] = (df["JUL"] > 500).astype("int")

df_small = df.loc[:, ["YEAR", "JUN_GT_500", "JUL_GT_500", "FLOODS"]]

df_small["COUNT"] = 1

df_small.head()

Out[4]:

YEAR JUN_GT_500 JUL_GT_500 FLOODS COUNT

0 1901 1 1 1 1

1 1902 0 1 1 1

2 1903 1 1 1 1

3 1904 1 1 1 1

4 1905 1 1 0 1

In [5]:

df_small.shape

Out[5]:

(118, 5)

In [6]:

# Creating the tabular data based on the counts

pd.crosstab(df_small["FLOODS"], df_small["JUN_GT_500"])

Out[6]:

JUN_GT_500 0 1

FLOODS

0 19 39

1 6 54

https://www.kaggle.com/lakshya91/a-case-study-with-conditional-probability 7/11
8/13/2021 Notebook

Defining some variables:

P(F) : Probability of flooding

P(J) : Probability of having more than 500 mm rain in June
P(F ∩ J) : Probability of flooding and having more than 500 mm rain in June
P(F|J) : Probability of flooding given it rained more than 500 mm in June

Based on the above table we can easily find these probabilities.

In [7]:

P_F = (6 + 54) / (6 + 54 + 19 + 39)

P_J = (39 + 54) / (6 + 54 + 19 + 39)

P_F_intersect_J = 54 / (6 + 54 + 19 + 39)

print(f"P(F): {P_F}")

print(f"P(J): {P_J}")

print(f"P(F AND J): {P_F_intersect_J}")

P(F): 0.5084745762711864

P(J): 0.788135593220339

P(F AND J): 0.4576271186440678

Using the formula - P(A|B) = P(A ∩ B) / P(B) we can easily calculate the conditional probability:

In [8]:

# Now calculate probailitity of flood given it rained more than 500 mm in June (P
(A|B))

P_F_J = P_F_intersect_J / P_J

print(f"P(F|J): {P_F_J}")

P(F|J): 0.5806451612903226

https://www.kaggle.com/lakshya91/a-case-study-with-conditional-probability 8/11
8/13/2021 Notebook

Now, we can also ask ourselves this: *given that it flooded in Kerala in a given year what is the
probability that it rained more than 500 mm in the month of June or July?* This is where Bayes
Theorem comes into action. Some other examples of Bayes Theorem are like:

The probability of a woman having breast cancer given she tested positive in the test
Probability that a given email is actually a spam given it contains certain flagged words.

Bayes Theorem can be easily derived using the relationship between conditional probability and
intersection of events. Given two events, we already know:
P (A ∩ B) = P (A|B). P (B) = P (B|A). P (A)

P (A|B). P (B)
so, P (B|A) =
P (A)

In Bayesian inference, `P(B)` is called **Prior Probability**. In our case, `P(J)` is the prior probability
which tells the probability of rain more than 500 mm in June (or July) without knowing whether it
flooded or not that year. We can see prior probability is the probability of the event we are interested in
before any new information.

Okay, enough chatter, let's try to code this in python. Actually we have already done most of the work,
it's just a matter of plugging in the numbers into the above equation.

In [9]:

# Probability of rain more than 500 mm in June given it flooded that year (P(B|A))

P_J_F = (P_F_J * P_J) / P_F

print(f"P(J|F): {P_J_F}")

P(J|F): 0.9000000000000001

In [10]:

# We can similarly do it for july

pd.crosstab(df_small["FLOODS"], df_small["JUL_GT_500"])

Out[10]:

JUL_GT_500 0 1

FLOODS

0 19 39

1 3 57

https://www.kaggle.com/lakshya91/a-case-study-with-conditional-probability 9/11
8/13/2021 Notebook

Defining the similar parameters for July:

P(F) : Probability of flooding

P(J) : Probability of having more than 500 mm rain in July
P(F ∩ J) : Probability of flooding and having more than 500 mm rain in July
P(F|J) : Probability of flooding given it rained more than 500 mm in July

In [11]:

P_F = (3 + 57) / (3 + 57 + 19 + 39)

P_J = (39 + 57) / (3 + 57 + 19 + 39)

P_F_intersect_J = 57 / (3 + 57 + 19 + 39)

print(f"P(F): {P_F}")

print(f"P(J): {P_J}")

print(f"P(F AND J): {P_F_intersect_J}")

P(F): 0.5084745762711864

P(J): 0.8135593220338984

P(F AND J): 0.4830508474576271

In [12]:

# Now calculate probailitity of flood given it rained more than 500 mm in July

P_F_J = P_F_intersect_J / P_J

print(f"P(F|J): {P_F_J}")

P(F|J): 0.59375

In [13]:

# Probability of rain more than 500 mm in July given it flooded that year (P(B|A))

P_J_F = (P_F_J * P_J) / P_F

print(f"P(J|F): {P_J_F}")

P(J|F): 0.9500000000000002

https://www.kaggle.com/lakshya91/a-case-study-with-conditional-probability 10/11
8/13/2021 Notebook

Important Takeaways

1. Based on the probability outputs above we can easily infer that it flooded almost 59% of the time in
the year when it rained more than 500 mm in July whereas for June it's only 58%. This means only
rainfall in the months of June and July are not completely responsible for the flooding in Kerala. This
actually makes sense since in both 2018 and 2020, the flooding happened in August. May be
including August in the analysis provide more insight to this.
2. Using Bayes theorem we found that whenever it flooded in Kerala, both June and July have a very
high probability (90% and 95% respectively) of rain for more than 500 mm. This also makes sense
June and July are the peak months of rainfall because of monsoon.

Thanks for reading my kernel! I hope it helped you to understand this concept as much as it helped me.

References and Resources:

https://www.statisticshowto.com/bayes-theorem-problems/
(https://www.statisticshowto.com/bayes-theorem-problems/)
https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/
(https://www.analyticsvidhya.com/blog/2017/03/conditional-probability-bayes-theorem/)
https://towardsdatascience.com/bayes-theorem-the-holy-grail-of-data-science-55d93315defb
(https://towardsdatascience.com/bayes-theorem-the-holy-grail-of-data-science-55d93315defb)

In [ ]:

https://www.kaggle.com/lakshya91/a-case-study-with-conditional-probability 11/11

File 1883
0% (1)
File 1883
3 pages
Calculation Procedure For Variable Area Flowmeters According To Vdi/Vde 3513 Part 1
No ratings yet
Calculation Procedure For Variable Area Flowmeters According To Vdi/Vde 3513 Part 1
1 page
Ttty
No ratings yet
Ttty
25 pages
Conditional probability, Bayes rule
No ratings yet
Conditional probability, Bayes rule
22 pages
Conditional & Independent Events
No ratings yet
Conditional & Independent Events
15 pages
Lec-20-Conditional Probability-1
No ratings yet
Lec-20-Conditional Probability-1
18 pages
03 Independence PDF
No ratings yet
03 Independence PDF
12 pages
Probability
No ratings yet
Probability
10 pages
Computing Probabilities 2
No ratings yet
Computing Probabilities 2
18 pages
Course Name: MEM601 Statistics: For Engineering Managers (2 Credit Hours)
No ratings yet
Course Name: MEM601 Statistics: For Engineering Managers (2 Credit Hours)
38 pages
Unit-3 Probability and Random Variables
No ratings yet
Unit-3 Probability and Random Variables
39 pages
Statistics For Business Economics Prob Theory Unit 2 (A)
No ratings yet
Statistics For Business Economics Prob Theory Unit 2 (A)
79 pages
Lecture 6 Probability II
No ratings yet
Lecture 6 Probability II
11 pages
2_271main
No ratings yet
2_271main
4 pages
Lecture Ps
No ratings yet
Lecture Ps
30 pages
Chapter 03
No ratings yet
Chapter 03
18 pages
Note - 1 PDF
No ratings yet
Note - 1 PDF
75 pages
Chapter Two: 2. Conditional Probability and Independence
No ratings yet
Chapter Two: 2. Conditional Probability and Independence
6 pages
Conditional Probability
No ratings yet
Conditional Probability
85 pages
Counting Mathematical Induction and Discrete Probability Part 3 21
No ratings yet
Counting Mathematical Induction and Discrete Probability Part 3 21
12 pages
Module 2 - Probability Concepts and Applications
No ratings yet
Module 2 - Probability Concepts and Applications
67 pages
Math 561 Probability - Chapter 2
No ratings yet
Math 561 Probability - Chapter 2
18 pages
22MT2005 CO1 Session 3
No ratings yet
22MT2005 CO1 Session 3
35 pages
Statistical Foundations: SOST70151 - LECTURE 1
No ratings yet
Statistical Foundations: SOST70151 - LECTURE 1
45 pages
INTRODUCTION
No ratings yet
INTRODUCTION
28 pages
CHAPTER 1 4 Slides Per Page
No ratings yet
CHAPTER 1 4 Slides Per Page
10 pages
Unit 5 Probability Concepts
No ratings yet
Unit 5 Probability Concepts
10 pages
Chapter+1 Recorded-1
No ratings yet
Chapter+1 Recorded-1
41 pages
Conditional Probalility and Baye's Theorem
No ratings yet
Conditional Probalility and Baye's Theorem
43 pages
Lecture 2 - Adv - Probability - Theory
No ratings yet
Lecture 2 - Adv - Probability - Theory
30 pages
Week 07 02 - Walpole 12042021 010116pm
No ratings yet
Week 07 02 - Walpole 12042021 010116pm
13 pages
Conditional Probability: Renu Rameshan
No ratings yet
Conditional Probability: Renu Rameshan
13 pages
Probability
No ratings yet
Probability
23 pages
Chapter Four: Conditional Probability and Independence
0% (2)
Chapter Four: Conditional Probability and Independence
8 pages
Week 5 Chapter 4 Basic Probability
No ratings yet
Week 5 Chapter 4 Basic Probability
45 pages
EGR 4201 Part A
No ratings yet
EGR 4201 Part A
10 pages
Basic Statistics-Week 8
No ratings yet
Basic Statistics-Week 8
25 pages
CHAPTER 1. Probability (1) .Pps
No ratings yet
CHAPTER 1. Probability (1) .Pps
41 pages
Distribusi Probabilitas
No ratings yet
Distribusi Probabilitas
100 pages
U-1 Intro S
No ratings yet
U-1 Intro S
38 pages
STAT110PART7
No ratings yet
STAT110PART7
22 pages
A Probability Is A Number Between 0 and 1, Inclusive
No ratings yet
A Probability Is A Number Between 0 and 1, Inclusive
13 pages
Book-6.2 Probability by Shar M Ch
No ratings yet
Book-6.2 Probability by Shar M Ch
20 pages
Slide Chap2
No ratings yet
Slide Chap2
31 pages
2 Conditional Probability 1
No ratings yet
2 Conditional Probability 1
51 pages
lec-note-3-2025_dbb87ae2-cf72-4923-9d15-5c2a74ee5fbd
No ratings yet
lec-note-3-2025_dbb87ae2-cf72-4923-9d15-5c2a74ee5fbd
5 pages
AAS24_1
No ratings yet
AAS24_1
29 pages
Conditional Probability Function
No ratings yet
Conditional Probability Function
4 pages
Introduction To Probability and Statistics
No ratings yet
Introduction To Probability and Statistics
49 pages
Chapter 2-3
No ratings yet
Chapter 2-3
41 pages
Chapter Two: 2. Conditional Probability and Independence
No ratings yet
Chapter Two: 2. Conditional Probability and Independence
8 pages
Probability and Statistics For Engineers
No ratings yet
Probability and Statistics For Engineers
25 pages
Lec 2
No ratings yet
Lec 2
27 pages
Conditional Probability
No ratings yet
Conditional Probability
10 pages
Probablity and Random Variables
No ratings yet
Probablity and Random Variables
113 pages
Chapter 4 Techno Conditional Prob 2
100% (1)
Chapter 4 Techno Conditional Prob 2
7 pages
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
Structured Decision Making
From Everand
Structured Decision Making
Andreas Michael Theodorou
No ratings yet
Probability Distributions: Six Sigma Thinking, #5
From Everand
Probability Distributions: Six Sigma Thinking, #5
Sumeet Savant
No ratings yet
Complex numbers
From Everand
Complex numbers
Alessio Mangoni
No ratings yet
Instruction for Using a Slide Rule
From Everand
Instruction for Using a Slide Rule
W. Stanley
No ratings yet
Job Description - Royal Enfield
No ratings yet
Job Description - Royal Enfield
4 pages
Brand Value Can Be Defined As A Cost of A Particular Company
No ratings yet
Brand Value Can Be Defined As A Cost of A Particular Company
1 page
Assinment 3
No ratings yet
Assinment 3
8 pages
Assignments
No ratings yet
Assignments
54 pages
Assignment 8
No ratings yet
Assignment 8
17 pages
Seminar Report - Anshul - Singh - 1773350029
No ratings yet
Seminar Report - Anshul - Singh - 1773350029
38 pages
Swot Analysis DB
No ratings yet
Swot Analysis DB
2 pages
Final MTP1 - Anshul Singh - 173350029
No ratings yet
Final MTP1 - Anshul Singh - 173350029
56 pages
RESOLUTION A.861 (20) Adopted On 27 November 1997 Performance Standards For Shipborne Voyage Data Recorders (VDRS)
No ratings yet
RESOLUTION A.861 (20) Adopted On 27 November 1997 Performance Standards For Shipborne Voyage Data Recorders (VDRS)
7 pages
Management 65700 Manufacturing Strategy and Process Innovation Krannert School of Management Purdue University
No ratings yet
Management 65700 Manufacturing Strategy and Process Innovation Krannert School of Management Purdue University
12 pages
Jurnal 2 E
No ratings yet
Jurnal 2 E
9 pages
Maths Worksheet 2
No ratings yet
Maths Worksheet 2
4 pages
Patterns of Respiration
No ratings yet
Patterns of Respiration
67 pages
Immediate download Geoinformation Remote Sensing Photogrammetry and Geographic Information Systems 2nd Edition Konecny ebooks 2024
No ratings yet
Immediate download Geoinformation Remote Sensing Photogrammetry and Geographic Information Systems 2nd Edition Konecny ebooks 2024
42 pages
Sunwave™ Prismatic Skylights: Product Information
No ratings yet
Sunwave™ Prismatic Skylights: Product Information
9 pages
EF3 Int Plus Syllabus PDF
No ratings yet
EF3 Int Plus Syllabus PDF
2 pages
What Is The Purpose of A Cover Letter Brainly
100% (2)
What Is The Purpose of A Cover Letter Brainly
7 pages
Wildly Important Goals S1a
No ratings yet
Wildly Important Goals S1a
2 pages
Primetals - The New Global Competence in Hot Rolling PDF
No ratings yet
Primetals - The New Global Competence in Hot Rolling PDF
13 pages
اهداء من فريق العمالقة متحان دور اول مع اجابات مقترحة 2024
No ratings yet
اهداء من فريق العمالقة متحان دور اول مع اجابات مقترحة 2024
11 pages
The Nestle Internship Program Aldovino, John Bryan Quality Assurance
No ratings yet
The Nestle Internship Program Aldovino, John Bryan Quality Assurance
5 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
2 pages
R-KEM II ETA - Polyester Resin For Concrete
No ratings yet
R-KEM II ETA - Polyester Resin For Concrete
16 pages
HDSD Kbone
No ratings yet
HDSD Kbone
8 pages
Definitions and Types of False Friends
50% (2)
Definitions and Types of False Friends
4 pages
Resume - Rajiv Singh - Format4
No ratings yet
Resume - Rajiv Singh - Format4
2 pages
Nature of Development Economics
No ratings yet
Nature of Development Economics
1 page
Top 20 Cancer Drugs
No ratings yet
Top 20 Cancer Drugs
8 pages
NM NSE Academy Final Assessment Reference Question Bank - Advanced Tally With GST
No ratings yet
NM NSE Academy Final Assessment Reference Question Bank - Advanced Tally With GST
24 pages
The Rapid and The Righteous 2 - 2 Rapid 2 Righteous Text Only
No ratings yet
The Rapid and The Righteous 2 - 2 Rapid 2 Righteous Text Only
1 page
Context:: Amanda Herold 4 Grade Elementary Band MUS - CLAS 150 Teaching and Learning Woodwind
No ratings yet
Context:: Amanda Herold 4 Grade Elementary Band MUS - CLAS 150 Teaching and Learning Woodwind
3 pages
Confirmation Paris
No ratings yet
Confirmation Paris
2 pages
The Philosophy Behind S-Curves
No ratings yet
The Philosophy Behind S-Curves
3 pages
Canguru de Matemática Brasil - LEVEL S - 2020 - Second Application
No ratings yet
Canguru de Matemática Brasil - LEVEL S - 2020 - Second Application
5 pages
Calculating Tank Durations For HFNC Systems and Bipap V 60 Systems
No ratings yet
Calculating Tank Durations For HFNC Systems and Bipap V 60 Systems
15 pages
schott-tubing-architecture_brochure_en
No ratings yet
schott-tubing-architecture_brochure_en
9 pages

A Case Study With Conditional Probability - Kaggle

Uploaded by

A Case Study With Conditional Probability - Kaggle

Uploaded by

8/13/2021 Notebook

Events and Sample Space

Event A: Getting a number which is divisible by 2

Number of cases where the output is divisible by 2

Number of cases where the output is divisible by either 2 OR 3

Again, in terms of probabilities:

Consider these two events:

Event A: Getting a number which is divisible by 2

Dependent and Independent events

Event A: Getting a blue ball

P(C) = P(A ∩ B) = P(A) P(B) = P(B) P(A) = 15 / 64

Conditional probability can be defined as follows:

About the Dataset

# Read the data

# Changing the target column to numeric values

df["FLOODS"] = df["FLOODS"].map({"YES": 1, "NO": 0})

df["JUN_GT_500"] = (df["JUN"] > 500).astype("int")

df["JUL_GT_500"] = (df["JUL"] > 500).astype("int")

df_small = df.loc[:, ["YEAR", "JUN_GT_500", "JUL_GT_500", "FLOODS"]]

YEAR JUN_GT_500 JUL_GT_500 FLOODS COUNT

# Creating the tabular data based on the counts

Defining some variables:

P(F) : Probability of flooding

Based on the above table we can easily find these probabilities.

P_F = (6 + 54) / (6 + 54 + 19 + 39)

P_J = (39 + 54) / (6 + 54 + 19 + 39)

print(f"P(F AND J): {P_F_intersect_J}")

P(F AND J): 0.4576271186440678

P_F_J = P_F_intersect_J / P_J

P_J_F = (P_F_J * P_J) / P_F

# We can similarly do it for july

Defining the similar parameters for July:

P(F) : Probability of flooding

P_F = (3 + 57) / (3 + 57 + 19 + 39)

P_J = (39 + 57) / (3 + 57 + 19 + 39)

print(f"P(F AND J): {P_F_intersect_J}")

P(F AND J): 0.4830508474576271

P_F_J = P_F_intersect_J / P_J

P_J_F = (P_F_J * P_J) / P_F

References and Resources:

You might also like