0% found this document useful (0 votes)
53 views

Probability and Statistics For Data Science

The document discusses probability and statistics concepts for data science. It provides a list of recommended textbooks and reference books on the topics of probability, statistics, and data science. It also lists some online resources and readings that will be referenced in the lecture notes. The document is authored by Dr. Faisal Bukhari from Punjab University College of Information Technology and contains material adapted from the listed resources.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Probability and Statistics For Data Science

The document discusses probability and statistics concepts for data science. It provides a list of recommended textbooks and reference books on the topics of probability, statistics, and data science. It also lists some online resources and readings that will be referenced in the lecture notes. The document is authored by Dr. Faisal Bukhari from Punjab University College of Information Technology and contains material adapted from the listed resources.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Probability and Statistics for

Data Science
Dr. Faisal Bukhari
Punjab University College of Information Technology
(PUCIT)
Textbooks
Probability & Statistics for Engineers & Scientists,
Ninth Edition, Ronald E. Walpole, Raymond H. Myer

Elementary Statistics: Picturing the World, 6th


Edition, Ron Larson and Betsy Farber

Elementary Statistics, 13th Edition, Mario F. Triola

Dr. Faisal Bukhari, PUCIT, PU, Lahore 2


Reference books
 Probability and Statistical Inference, Ninth Edition,
Robert V. Hogg, Elliot A. Tanis, Dale L. Zimmerman

 Probability Demystified, Allan G. Bluman

Schaum's Outline of Probability, Second Edition,


Seymour Lipschutz, Marc Lipson

Python for Probability, Statistics, and Machine Learning, José


Unpingco

Practical Statistics for Data Scientists: 50 Essential Concepts,


Peter Bruce and Andrew Bruce

Think Stats: Probability and Statistics for Programmers, Allen


Downey
Dr. Faisal Bukhari, PUCIT, PU, Lahore 3
References
Readings for these lecture notes:
 Probability & Statistics for Engineers & Scientists, Ninth edition,
Ronald E. Walpole, Raymond H. Myer
 Probability Demystified, Allan G. Bluman
 Elementary Statistics: Picturing the World, 6th Edition, Ron Larson
and Betsy Farber
 https://www.statisticshowto.com/probability-and-
statistics/statistics-definitions/conditional-probability-definition-
examples/#:~:text=Conditional%20probability%20is%20the%20pro
bability,probability%20of%200.5%20(50%25).
 https://en.wikipedia.org/wiki/Contingency_table#:~:text=In%20stat
istics%2C%20a%20contingency%20table,%2C%20engineering%2C%
20and%20scientific%20research.

These notes contain material from the above resources.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 4


Independent and Dependent Events
[1]
Two events A and B are independent if and only if P(B|A)
= P(B) or P(A|B) = P(A), assuming the existences of the
conditional probabilities. Otherwise, A and B are
dependent.
OR
Two events, A and B, are said to be independent if the
fact that event A occurs does not affect the probability
that event B occurs.
OR
A conditional probability is the probability of an event
occurring, given that another event has already occurred.
The conditional probability of event B occurring, given
that event A has occurred, is denoted by P(B|A) and is
read as “probability of B, given A.”
Dr. Faisal Bukhari, PUCIT, PU, Lahore 5
Independent and Dependent Events
[2]
Example 1: If a coin is tossed and then a die is rolled,
the outcome of the coin in no way affects or
changes the probability of the outcome of the die.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 6


Independent and Dependent Events [3]

Example 2: Selecting a card from a deck, replacing it,


and then selecting a second card from a deck. The
outcome of the first card, as long as it is replaced,
has no effect on the probability of the outcome of
the second card.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 7


Independent and Dependent Events [4]
Two events A and B are independent if and only if
P(B|A) = P(B) or P(A|B) = P(A), assuming the existences
of the conditional probabilities. Otherwise, A and B are
dependent
OR
When the occurrence of the first event in some way
changes the probability of the occurrence of the
second event, the two events are said to be
dependent.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 8


Independent and Dependent Events
[5]
Example 1: Suppose a card is selected from a deck
and not replaced, and a second card is selected. In
this case, the probability of 1 selecting any specific
card on the first draw is 52, but since this card is not
replaced, the probability of selecting any other
specific card on the second 1 draw is 51, since there
are only 51 cards left.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 9


Independent and Dependent Events [6]

Example 2: Drawing a ball from an urn, not replacing


it, and then drawing a second ball.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 10


Example: The table at shows the results of a study in
which researchers examined a child’s IQ and the
presence of a specific gene in the child. Find the
probability that a child has a high IQ, given that the
child has the gene.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 11


Solution

𝟑𝟑
P(B|A) = = 0.458
𝟕𝟐

Dr. Faisal Bukhari, PUCIT, PU, Lahore 12


First Multiplication Rule [1]
Before explaining the first multiplication rule,
consider the example of tossing two coins. The
sample space is HH, HT, TH, TT. From classical
probability theory, it can be determined that the
probability of getting two heads is 𝟏Τ𝟒.

However, there is another way to determine the


probability of getting two heads. In this case, the
probability of getting a head on the first toss is 𝟏Τ𝟐,
and the probability of getting a head on the second
toss is also 𝟏Τ𝟐.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 13


First Multiplication Rule [2]
So the probability of getting two heads can be
determined by multiplying 𝟏Τ𝟐 × 𝟏Τ𝟐 = 𝟏Τ𝟒.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 14


Multiplication Rule I [1]

Multiplication Rule I: For two independent events A


and B,
P(A and B)= P(A) × P(B).

In other words, when two independent events occur


in sequence, the probability that both events will
occur can be found by multiplying the probabilities of
each individual event.

The word “and” is the key word and means that both
events occur in sequence and to multiply.
Dr. Faisal Bukhari, PUCIT, PU, Lahore 15
Multiplication Rule I [2]

Example: A coin is tossed and a die is rolled. Find the


probability of getting a tail on the coin and a 5 on
the die.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 16


Solution:
Let A be the event of getting a tail on the coin
P(A) = 𝟏Τ𝟐 = 0.5 ( or 50%)

Let B be the event of getting a 5 on the die


P(B) = 𝟏Τ𝟔 = 0.1667 (or 16.67 %)

Since A and B are independent events, so

P(A and B) = P(A) × P(B)


= 𝟏Τ𝟐 × 𝟏Τ𝟔 = 𝟏Τ𝟏𝟐
= 0.0833 (or 8.33 %)
Dr. Faisal Bukhari, PUCIT, PU, Lahore 17
Multiplication Rule 1 [3]
The previous example can also be solved using classical
probability. Recall that the sample space for tossing a
coin and rolling a die is
S ={H1, H2, H3, H4, H5, H6, T1, T2, T3, T4, T5, T6}
n(S) = 12
Let A be the event of getting a “T5”
A = {T5}
n(A) = 1
P(A) = 𝟏Τ𝟏𝟐
= 0.0833 (or 8.33 %)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 18


Multiplication Rule 1 [4]
Example: An urn contains 2 red balls, 3 green balls,
and 5 blue balls. A ball is selected at random and its
color is noted. Then it is replaced and another ball is
selected and its color is noted. Find the probability of
each of these:

a. Selecting 2 blue balls

b. Selecting a blue ball and then a red ball

c. Selecting a green ball and then a blue ball

Dr. Faisal Bukhari, PUCIT, PU, Lahore 19


Solution
Let R be an event of getting a red ball
Let G be an event of getting a green ball
Let B be an event of getting a blue ball
P(R) = 𝟐Τ𝟏𝟎, P(G) = 𝟑Τ𝟏𝟎, P(B) = 𝟓Τ𝟏𝟎
Since events are independent, so
a. P(B and B) =P(BB) = P(B) x P(B) = 5Τ10 x 5Τ10 = 1Τ4
= 0.25 (or 25%)

b. P(B and R) = P(B) x P(R) = 5Τ10 x 2Τ10 = 1Τ10


= 0.10 (or 10 %)

c. P(G and B) = P(G) x P(B) = 3Τ10 x 5Τ10 = 3Τ20


= 0.15 (or 15 %)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 20


Multiplication Rule 1 [5]
Example: A die is tossed 3 times. Find the probability
of getting three 6s.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 21


Solution
Let A be the event of getting a ‘6’
P(A) = 1Τ6
Since events are independent, so
P(A and A and A) = P(A) x P(A) x P(A)
= 1Τ6 x 1Τ6 x 1Τ6
= 1Τ216 (= 0.0046 or 0.4600 %)
OR
P(AAA) = 1Τ6 x 1Τ6 x 1Τ6
= 1Τ216
= 0.0046 (or 0.4600 %)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 22


Multiplication Rule 1 [6]
Example: It is known that 66% of the students at a
large college favor building a new fitness center. If
two students are selected at random, find the
probability that all of them favor the building of a
new fitness center.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 23


Solution
Let F be the event that a student favor the building of
a new fitness center

P(F) = 0.66

P(F and F) or P(FF) = (0.66)(0.66)


= 0.4356 or 43.56%.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 24


Multiplication Rule II [1]
When two sequential events are dependent, a
slight variation of the multiplication rule is used to
find the probability of both events occurring.

For example, when a card is selected from an


ordinary deck of 52 cards the probability of getting
a specific card is 𝟏Τ𝟓𝟐, but the probability of getting
a specific card on the second draw is 𝟏Τ𝟓𝟏 since 51
cards remain.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 25


Example: Two cards are selected from a deck and the
first card is not replaced. Find the probability of
getting two kings.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 26


Solution
P(two kings) = 4Τ52 x 3Τ51

= 12Τ2652

= 1Τ221

= 0.0045 (or 0.45 %)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 27


Multiplication Rule II [2]
When the two events A and B are dependent, the
probability that the second event B occurs after
the first event A has already occurred is written as
P(B|A).

This does not mean that B is divided by A; rather, it


means and is read as ‘‘the probability that event B
occurs given that event A has already occurred.’’

P(B|A) also means the conditional probability that


event B occurs given event A has occurred.
Dr. Faisal Bukhari, PUCIT, PU, Lahore 28
Multiplication Rule II [3]
The probability of an event B occurring when it is
known that some event A has occurred is called a
conditional probability and is denoted by P(B|A).

 The symbol P(B|A) is usually read “the probability


that B occurs given that A occurs”
OR
simply “the probability of B, given A.”

Dr. Faisal Bukhari, PUCIT, PU, Lahore 29


Multiplication Rule II [4]:
When two events are dependent, the probability of
both events occurring is P(A and B)= P(A) x P(B|A)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 30


Example: A box contains 24 toasters, 3 of which are
defective. If two toasters are selected and tested,
find the probability that both are defective (assume
toasters are not replaced).

Dr. Faisal Bukhari, PUCIT, PU, Lahore 31


Solution
Let D1 be the event that first toaster is defective.
Let D2 be the event that second toaster is defective.
P(D1 and D2) = P(D1) x P(D2|D1)

= 3Τ24 x 2Τ23

= 1Τ8 x 2Τ23

= 1Τ92

= 0.0109 (or 1.0870 %)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 32


Multiplication Rule II [5]:
When two events are dependent, the probability of
both events occurring is P(A and B)= P(A) x P(B|A)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 33


Multiplication Rule II [6]:
Example: Two cards are drawn without replacement
from a deck of 52 cards. Find the probability that
both are queens.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 34


Solution
Let Q1 be the event that the first card is a queen.
Let Q2 be the event that the second card is a queen.
P(Q1 and Q2) = P(Q1) x P(Q2|Q1)

= 4Τ52 x 3Τ51

= 1Τ221

= 0.0045 (0.4525%)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 35


Multiplication Rule II [7]:
Example: A box contains 3 orange balls, 3 yellow
balls, and 2 white balls. Three balls are selected
without replacement. Find the probability of
selecting 2 yellow balls and a white ball.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 36


Solution
Orange balls Yellow White balls Total balls
3 3 2 8
Let Y1 be the event that the first ball is yellow.
Let Y2 be the event that the second ball is yellow.
Let W3 be the event that the third ball is white.

P(Y1 and Y2 and W3) or P(Y1Y2W3) = 3Τ8 × 2Τ7 × 2Τ6

= 12Τ336

= 0.0357(or3.5714 %)
Note: The key word for the multiplication rule is and. It means to multiply.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 37


Multiplication Rule II [1]:
Example: A box contains 3 orange balls, 3 yellow
balls, and 2 white balls. Three balls are selected
without replacement. Find the probability of
selecting a white ball and 2 yellow balls.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 38


Solution
Orange balls Yellow balls White balls Total balls
3 3 2 8

Let W1 be the event that the first ball is white.


Let Y2 be the event that the second ball is yellow.
Let Y3 be the event that the third ball is yellow.

P( W1 and Y2 and Y3) or P(W1Y2Y3) = 2Τ8 × 3Τ7 × 2Τ6

= 12Τ336

= 0.0357 (or3.5714 %)
Note: The key word for the multiplication rule is and.
It means to multiply.Dr. Faisal Bukhari, PUCIT, PU, Lahore 39
Conditional Probability [1]
Previously, conditional probability was used to find
the probability of sequential events occurring when
they were dependent.

Recall that P(B|A) means the probability of event


B occurring given that event A has already
occurred.

Another situation where conditional probability


can be used is when additional information about
an event is known.

Sometimes it might be known that some


outcomes in the sample space have occurred or
that some outcomes cannot occur.
Dr. Faisal Bukhari, PUCIT, PU, Lahore 40
Conditional Probability [2]
When conditions are imposed or known on events,
there is a possibility that the probability of the certain
event occurring may change.

Example: A die is rolled; find the probability of getting a


4 if it is known that an even number occurred when
the die was rolled.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 41


Alternative Approach: Conditional
Probability [1]
Solution:
If it is known that an even number has occurred, the
sample space is
Reduced sample space = {2, 4, 6}
n(S’) = 3
Let A be the event of getting a ‘4’
A = {4}

n(A) = 1

P(A) = 𝟏Τ𝟑 = 0.3333 (33.33%)


Dr. Faisal Bukhari, PUCIT, PU, Lahore 42
Sample space of two dice using
table
A table can be used for the sample space when two
dice are rolled.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 43


Alternative Approach: Conditional
Probability [2]
Example: Two dice are rolled. Find the probability of
getting a sum of 3 if it is known that the sum of the
spots on the dice was less than six.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 44


Solution
Reduced sample space ={(1, 1), (1, 2), (2, 1), (3, 1), (2, 2),
(1, 3), (1, 4), (2, 3), (3, 2), and (4, 1)}
n(S’) = 10

Let A be the event of getting a ‘sum of 3’


A = {(1, 2), (2, 1)}, n(A) = 2

P(A) = 𝟐Τ𝟏𝟎 = 𝟏Τ𝟓

or
P(sum of 3|sum less than 6) = 𝟐Τ𝟏𝟎
= 𝟏Τ𝟓 = 0.20 (or 20%)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 45


Alternative Approach: Conditional
Probability [3]
The two previous examples of conditional probability
were solved using classical probability and reduced
sample spaces; however, they can be solved by using
the following formula for conditional probability.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 46


Alternative Approach: Conditional
Probability [4]
The conditional probability of two events A and B is
P(A|B) = P(A and B)/P(B)

OR
P(A and B)
=
P(B)

P(A and B) means the probability of the outcomes


that events A and B have in common.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 47


Conditional Probability without
reducing the sample space [1]
Example: A die is rolled; find the probability of getting a
4, if it is known that an even number occurred when
the die was rolled.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 48


Solution
S = {1, 2, 3, 4, 5, 6}
Let events are defined as:
A: Getting a 4 on a die
B: An even number occur on a die
∵ P(A|B) = P(A and B)/P(B)
P(A and B) = 𝟏Τ𝟔
P(B) = 𝟑Τ𝟔

P(A and B) 1 × 6
P(A|B) = = Τ6 Τ3
P(B)

= 1Τ3 = 0.3333(or 33.33%)


Dr. Faisal Bukhari, PUCIT, PU, Lahore 49
Sample space of two dice using
table
A table can be used for the sample space when two
dice are rolled.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 50


Conditional Probability without
reducing the sample space [2]
Example: Two dice are rolled. Find the probability of
getting a sum of 3 if it is known that the sum of the
spots on the dice was less than 6.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 51


Solution [1]:
∵ P(A|B) = P(A ∩ B)/P(B)
Let events are defined as:
A ∩ 𝐁: Getting a sum 3 and sum of the spots on the
dice was less than 6
A: Getting sum of the spots on the dice was 3

B: Getting sum of the spots on the dice was less


than 6
A ∩ 𝐁 = {(2, 1), (1, 2)}
n(A ∩ 𝐁) = 2

P(A ∩ 𝐁) = 2Τ36 = 1Τ18

= 0.0555 (or 5.55 %)


Dr. Faisal Bukhari, PUCIT, PU, Lahore 52
Solution [2]:
Let B be the event of getting sum of the spots on the
dice was less than 6
B = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2, 3), (3,
1), (3, 2), (4, 1)}

P(B) = 10Τ36 = 5Τ18 = 0.2777 (or 27.78 %)

P(A|B) = P(A and B)/P(B)


= 1Τ18 × 18Τ5 = 1Τ5 = 0.2 (or 20 %)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 53


Alternative approach: Conditional
Probability with reducing the
sample space
Example: Two dice are rolled. Find the probability of
getting a sum of 3 if it is known that the sum of the
spots on the dice was less than 6.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 54


Solution
If it is known that the sum of the spots on the dice
was less than 6
Let reduced sample space = 𝑆 ′
⇒ 𝑺′ = {(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2, 3),
(3, 1), (3, 2), (4, 1)}
n(𝑺′ ) = 10
Let A be the event of getting a sum of 3
A = {(2, 1), (1, 2)}
𝟐 𝟏
P(A) = = = 0.2 (or 20 %)
𝟏𝟎 𝟓

Dr. Faisal Bukhari, PUCIT, PU, Lahore 55


Alternative approach: Conditional
Probability with reducing the
sample space
Example: When two dice were rolled, it is known
that the sum was an even number. In this case, find
the probability that the sum was 8.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 56


Solution:
Reduced sample space = S’
{(1, 1), (1, 3), (1, 5), (2, 2), (2, 4), (2, 6), (3, 1), (3, 3),
(3, 5), (4, 2), (4, 4), (4, 6), (5, 1), (5, 3), (5, 5), (6, 2),
(6, 4), (6, 6)}
n(S’) = 18
LetA be the event of getting a sum of ‘8’
A = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}
n(A) = 5
P(A) = 5Τ18 = 0.2777 (27. 78%)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 57


A Contingency Table
In statistics, a contingency table (also known as
a cross tabulation or crosstab) is a type of table in
a matrix format that displays the
(multivariate) frequency distribution of the variables.
They are heavily used in survey research, business
intelligence, engineering, and scientific research.
They provide a basic picture of the interrelation
between two variables and can help find
interactions between them.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 58


Example: This question uses the following
contingency table:

What is the probability a randomly selected person is


male, given that they own a pet?

Dr. Faisal Bukhari, PUCIT, PU, Lahore 59


Step 1: Repopulate the formula with new variables
M is for male and PO stands for pet owner, so the formula
becomes:

P(A and B)
P(A|B) =
P(B)
P(M|PO) = P(M∩PO) / P(PO) --------------------(1)

Step 2: Figure out P(M∩PO) from the table. The intersection


of male/pets (the intersection on the table of these two
factors) is 0.41

Dr. Faisal Bukhari, PUCIT, PU, Lahore 60


Step 3: Figure out P(PO) from the table. From the total
column, 86% (0.86) of respondents had a pet

Step 4: Insert your values into the formula:


P(M|PO) = P(M∩PO) / P(M)
= 0.41 / 0.86
= 0.477, or 47.7%.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 61


Example: In a large housing plan, 35% of the homes
have a deck and a two-car garage, and 80% of the
houses have a two-car garage. Find the probability
that a house has a deck given that it has a two-car
garage.

Dr. Faisal Bukhari, PUCIT, PU, Lahore 62


Solution
Let D be the event of getting deck and two-car garage
Let G be the event of getting two-car garage
Given
P(D) = 0.35
P(G) = 0.80
P(deck|two-car garage) = P(D)ൗP(G)

= 0.35Τ = 7Τ16
0.80

= 0.4375 (or 43.75 %)

Dr. Faisal Bukhari, PUCIT, PU, Lahore 63


A summary of probability

Dr. Faisal Bukhari, PUCIT, PU, Lahore 64

You might also like