Mark Anthony Legaspi - Module 3 - Probability
Mark Anthony Legaspi - Module 3 - Probability
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph
Data Science
Module 3 - Probability
I. Introducti on
This module is about probability, a very important concept that you need to
understand before delving into data science. Probability is defined as a
number that represents the likelihood of an uncertain event. Understanding
and modeling probabilities is a crucial component of data science (and
machine learning).
It is expected that after completing this module, you will have the sufficient
knowledge about probabilities that you will need in data science. This is true
if you will be dealing with predictions based on your data set and that you
need to understand the uncertainty associated with your predictions.
Please feel free to use other resources that you might find on the Internet in
order for you to have numerous examples of the different concepts that will
be introduced in this module.
A. Permutations
Permutations represent the number of different possible ways we can
arrange a number of elements.
Characteristics of Permutations:
• Arranging all elements within the sample space.
• No repetition.
• 𝑃 𝑛 = 𝑛 × 𝑛 − 1 × 𝑛 − 2 × ⋯ × 1 = 𝑛! (Called “n factorial”)
Example:
• If we need to arrange 5 people, we would have P(5) = 120 ways of doing so.
Factorials express the product of all integers from 1 to n and we denote them
with the “!” symbol.
𝑛! = 𝑛 × 𝑛 − 1 × 𝑛 − 2 × ⋯ × 1
Key Values:
• 0! = 1.
• If n<0, n! does not exist.
B. Variations
Variations represent the number of different possible ways we can pick
and arrange a number of elements.
C. Combinations
Combinations represent the number of different possible ways we can
pick a number of elements.
To win the lottery, you need to satisfy two distinct independent events:
• Correctly guess the “Powerball” number. (From 1 to 26)
• Correctly guess the 5 regular numbers. (From 1 to 69)
Symmetry of Combinations
Let’s see the algebraic proof of the notion that selecting p-many elements out of a
set of n is the same as omitting n-p many elements.
D. Bayesian Notation
A set is a collection of elements, which hold certain values. Additionally,
every event has a set of outcomes that satisfy it.
The null-set (or empty set), denoted ∅, is a set which contain no values.
E. Multiple Events
The sets of outcomes that satisfy two events A and B can interact in one
of the following 3 ways.
F. Intersection
The intersection of two or more events expresses the set of outcomes
that satisfy all the events simultaneously. Graphically, this is the area
where the sets intersect.
Remember:
All complements are mutually exclusive, but not all mutually exclusive
sets are complements.
Example:
Dogs and Cats are mutually exclusive sets, since no species is
simultaneously a feline and a canine, but the two are not complements,
since there exist other types of animals as well.
J. Conditional Probability
For any two events A and B, such that the likelihood of B occurring is
greater than 0 (𝑃 𝐵 > 0), the conditional probability formula states the
following:
L. Additive Law
The additive law calculates the probability of the union based on the
probability of the individual sets it accounts for.
N. Bayes’ Law
Bayes’ Law helps us understand the relationship between two events by
computing the different conditional probabilities.
We also call it Bayes’ Rule or Bayes’ Theorem.
O. Probability Distribution
A probability distribution is a statistical function that describes all the
possible values and likelihoods that a random variable can take within a
given range. This range will be bounded between the minimum and
maximum possible values, but precisely where the possible value is likely
to be plotted on the probability distribution depends on a number of
factors. These factors include the distribution's mean (average), standard
deviation, skewness, and kurtosis.
Perhaps the most common probability distribution is the normal
distribution, or "bell curve," although several distributions exist that are
commonly used. Typically, the data generating process of some
phenomenon will dictate its probability distribution. This process is called
the probability density function.
a. Discrete Distribution
Discrete Distributions have finitely many different possible
outcomes. They possess several key characteristics which separate
them from continuous ones.
i. Uniform Distribution
A distribution where all the outcomes are equally likely is
called a Uniform Distribution.
b. Continuous Distribution
If the possible values a random variable can take are a sequence
of infinitely many consecutive values, we are dealing with a
continuous distribution.
questions.
Answer:
First, all of the 5 tasks need to be arranged. In this problem we are
looking for the number of Permutations between 5 elements. The
permutation would be 5! and it will look like this 5 * 4 * 3 * 2* 1
which is equal to 120 ways of completing my tasks.
a) Calculate how many different options you have for the entire
campaign, assuming you want to use a different one for each
platform.
b) Calculate how many different options you have for the entire
campaign, assuming you can use the same banner for some or all the
platforms.
c) Calculate how many ways we can pick which of the 8 banners to
use, assuming we use different ones for each platform.
d) Calculate how many ways we can pick which of the 8 banners to
use, assuming we can use each one multiple times.
Answers:
a) We will use different banners for each platform and we can think
of each social media platform as a different position. In this case, we
will use the Variations because we are using different banners for
each platform. There should be no repeated values so the formula is
this:
n! 8!
V= = =4 × 5 ×6 ×7 × 8=6,720
( n− p ) ! 3!
b) It is same as the first one but this time we can repeat the values, so
it would be Variations with repetition, the formula is this:
V́ =n p=8 5=32,768
n! 8! 6 × 7 ×8
C= = = =7 × 8=56.
p ! (n− p)! 5 ! 3! 6
d) We will select 5 banners out of 8 again but this time we can use it
multiple times. So, we need to use the Combinations with repetition.
The formula and answer are this:
In this case, it is important to know how many times we will use each
banner and also to know which banner we will use, so we can assign them
accordingly. If we didn’t care how many times, we use the banners we
have selected, then we need to find the sum of C 85+ C84 + C83 +C 82+ C81 . This is
because we are estimating the number of ways, we can select the
banners, assuming we are using 5 different ones, 4 different ones, 3
different ones and so on.
c. You are renovating your entire apartment and want to repaint the
walls of each room. The flat consists of two bedrooms, a kitchen, a
living room, a bathroom, a study and a hall, or 7 rooms in total. You
have at your disposal several colors of paint: white, yellow, orange,
red, purple, blue, green, grey and pink.
How many different ways can you paint the house, assuming…?
Answers:
a) In this case, we are dealing with Variations because we will use
different colors for each room in the apartment. So, we cannot
repeat the values, and the formula and answer are this:
n! 9!
V= = =9 × 8× 7 ×6 ×5 × 4 × 3=181,440.
( n− p)! 2!
b) If we will paint the bathroom, study and hall in white, we will just
need to think about the other 4 rooms. Now, this problem can be
interpreted several different ways, so let us examine each outcome:
n! 8!
V= = =5× 6 ×7 × 8=1,680.
( n− p)! 4 !
V́ =n p=94=6,561.
We phrased the question with the idea of going for interpretation “b”,
but we see merit in the other approaches as well.
d. This year, you are helping organize your college’s career fest.
There are 11 companies which are participating, and you have
just enough room fit all of them. How many ways can you
arrange the various firms, assuming…:
Answers:
ways, we can set up each group around the room, we just have
two events with distinct sample spaces.
Let’s start with arranging the 3 banks in the middle. Since we need
to split the 3 middle spots among the 3 banks all we need to do is
compute the number of Permutations among 3 elements.
Therefore, Pn=3 !=6.
Now, since none of the remaining 8 firms cares too much where
they are positioned, we once again need to arrange them around
the room. Since we have 8 firms and 8 positions, we once again
rely of permutations, so Pn = 8! = 40,320.
For any of the 40,320 ways we set the 8 firms around the room,
we have 6 different ways to arrange the 3 banks in the middle.
Therefore, in total, we have 40,320 × 6 = 241,920 ways of setting
up the career fest.
Answer:
a) Now, if both twins enjoy all 5 cakes, then need to find the number
of different combinations of picking 2 cakes out of these 5. Since we
are not explicitly told whether we could get the same cake for both,
we should consider both scenarios.
(n+ p−1)! 5!
have Ć= = =10 options.
p ! (n−1) ! 2 ! 3!
(Alternatively, we can get one Coconut cake and 1 other cake. That
way Steve will still have something else to eat. In that scenario, if we
can have two identical cakes, then the only option which we want to
avoid is the double Coconut one. Thus, we take the 15 we got in part
b of a), and subtract 1, so we get 14 options.
c) If Amy loves chocolate, one of the two cakes must be Sacher. Then,
we only need to think about what the other one is. Since we have 5
different cakes, then we have 5 options for choosing the cakes.
n! 5!
a. If we decide to do so, then we have V = = =4 × 5=20
( n− p)! 3!
different orders.
b. Now, if we are allowed to get them identical cakes, then we have
variations with repetition. Thus, V́ =5 2=25.
Answers:
3 __ __ __ __ __ __ __ __
3 20 __ __ __ __ __ __ __
For triceps and back we have two options each, so we add those as
well.
3 20 4 4 4 2 2 __ __
3 20 4 4 4 2 2 12 __
(Please ask your instructor for the download link of the solution file to
check your answers)
Here are the questions we left as homework towards the end of the
video:
Answer:
Answer:
We can interpret the likelihood of getting a place on the waiting list
two different ways and each is equally correct, given we clearly define
our understanding of the problem. If we assume that we want the
probability of getting on the waiting list, upon applying to Hamilton,
then the probability would equal the number of students on the
waiting list, over the total number of students who applied. From
table C1 we know that we had 2590 male and 3088 female applicants,
or 5678 total candidates that year. Since 1299 of them landed on the
waiting list, then the likelihood was: 1299 / 5678, or close to 22.88%.
Answer:
We know that 629 students accepted a place on the waiting list, and
out of those 629, 33 got admitted. Thus, the likelihood of getting
admitted, given a student accepted a place on the waiting list, equals
33 / 629, or 5.25%.
Answer:
This question might seem the exact same as the one before, but this
time we are asking for the likelihood of being admitted, given the
student was offered a place on the waiting list. This means our sample
space is not just the 629 students who accepted the place in the
waiting list, but the entire 1299, who were offered one. Thus, the
likelihood equals 33 / 1299, or roughly 2.54%.
(Please ask your instructor for the download link of the solution file to
check your answers)
1. Using your own words, differentiate the following terms from each other.
Permutation Combination
Permutation is an arrangement of things Combination is grouping/selection of things
where order of arrangement matters. It is where order does not matter. It is denoted by
denoted by nPr and its formula is npr=n!/(n- nCr and its formula is ncr=n!/r!*(n-r!). Only one
r!). The permutation can be associated with combination can be derived with one
position and many permutations can be permutation. Combination indicates different
derived from a single combination. The ways of selecting menu items, food, clothes,
Permutation denotes several ways to arrange subjects, etc. Just like the permutation there
things, people, digits, alphabets, colors, etc. are also two types of combination, the
There are basically two types of permutation, repetition is allowed and no repetition. An
the repetition is allowed and no repetition. example for the repetition is allowed is the
An example of repetition is allowed is the coins in our pocket (1, 1, 5, 5, 5). And the
digit combination lock because it could 222 example for no repetition is the lottery
passwords and the digits can be repeated. numbers (3, 5, 7, 23, 12, 16, 20, 18).
And the example for no repetition is the first Combination answers How many different
three people in a running race because we groups can be picked from a larger group of
can’t be first and second. Permutation objects? The combination implies unordered
answers How many different arrangements set or pairing of values within specific criteria.
can be created from a given set of objects?
The permutation is nothing but an unordered
combination.
Population Sample Space
A population may refer to an entire group of A sample space is used a lot in the sciences and
people, objects, events, restaurants, place, or in mathematics. A sample space is usually
measurements. A population can thus be said denoted by the letter S and it is the set of all
to be an aggregate observation of subjects possible outcomes in the experiment. Each
grouped together by a common feature. It outcome in a sample space is called a sample
includes all the elements from the data set and point. It is also called an element or a member
of the sample space. Sample space can be
measurable characteristics of the population
written using the set notation, { } and the
such as mean and standard deviation are
possible ordered outcomes are listed
known as a parameter. Example of population as elements in the set. It is common to refer to a
are the voting intentions of all voters in sample space by the labels S, Ω, or U (for
Philippines, all sales receipts for October. A "universal set"). The elements of a sample
population can be vague or specific. space may be numbers, words, letters, or
Examples of population include the number symbols. They can also be finite, countably
of newborn babies in Singapore, total number infinite, or uncountably infinite. The possible
of tech startups in Europe, average height of outcomes must be mutually exclusive and
all PBA players in the Philippines, mean exhaustive. Mutually exclusive means they are
weight of U.S. taxpayers, voting intentions of distinct and non-overlapping and the exhaustive
all voters in Philippines, all sales receipts for means complete. When determining a sample
space, we must be careful to include all
October and so on. Populations can be the
possibilities and this may become a difficult
complete set of all similar items that exist, it
task when the sample space becomes very large.
can be a theoretical construct that is A sample space S is either discrete or
potentially infinite in size and it share a set of continuous. The example of sample space is
attributes that we define. There are different Tossing a die. The Possible outcomes after
types of population, they are Finite tossing a die are the numbers 1, 2, 3, 4, 5, and 6.
Population, Infinite Population, Existent So the sample space would be, S = {1, 2, 3, 4, 5,
Population and Hypothetical Population. 6}.
Discrete Probability Distribution Continuous Probability Distribution
A discrete distribution is one in which the A continuous distribution describes the
data can only take on certain values, for probabilities of the possible values of a
example integers. A discrete distribution
continuous random variable. A continuous
describes the probability of occurrence of
each value of a discrete random variable. A random variable is a random variable with a set
discrete random variable is a random variable of possible values (known as the range) that is
that has countable values, such as a list of infinite and uncountable. A continuous
non-negative integers. In a discrete distribution is appropriate when the variable
distribution, probabilities can be assigned to can take on an infinite number of values.
the values in the distribution - for example, Continuous distributions cannot be written so
"the probability that the web page will have
neatly compared to the uniform discrete
12 clicks in an hour is 0.15." There are several
specialized discrete probability distributions distribution. Probabilities of continuous
that are useful for specific applications. For random variables (X) are defined as the area
business applications, three frequently used under the curve of its PDF. Thus, only ranges of
discrete distributions are Binomial, values can have a nonzero probability. The
Geometric, and Poisson. With a discrete probability that a continuous random variable
probability distribution, each possible value
of the discrete random variable can be equals some value is always zero. The
associated with a non-zero probability. Thus, continuous normal distribution can describe
a discrete probability distribution is often
the distribution of weight of adult males. For
presented in tabular form. A discrete
distribution is appropriate when the variable example, we can calculate the probability that
can only take on a fixed number of values. a man weighs between 160 and 170 pounds.
For example, if we roll a normal die, we can Many continuous distributions are used for
get 1, 2, 3, 4, 5, or 6. We cannot get 1.2 or business applications. The two of the most
0.1. If it is a fair die, the probability widely used are the Uniform and Normal.
distribution will be 1/6, 1/6, 1/6, 1/6, 1/6, The uniform distribution is useful because it
1/6. Another example, we can use the
represents variables that are evenly distributed
discrete Poisson distribution to describe the
number of customer complaints within a day. over a given interval. And the normal
Suppose the average number of complaints distribution is useful for a wide array of
per day is 10 and we want to know the applications in many disciplines.
probability of receiving 5, 10, and 15
customer complaints in a day.
a. Probability
b. Event
domain for an event because it is not function, it is only a set. More over
event is a specific term and random variable is a general term. An event
that is certain to happen has a probability of 1. An event that cannot
possibly happen has a probability of zero. If there is a chance that an
event will happen, then its probability is between zero and 1. Some of the
examples of events are tossing a coin and it landing on heads, rolling a '3'
on a die and guessing a certain number between 000 and 999 (lottery).
There are different types of event in probability the impossible and sure
events, simple events, compound events, independent and dependent
events, mutually exclusive events, exhaustive events, complementary
events, events associated with “OR” and “AND”, and event E1 but not E2.
c. Mean
d. Standard deviation
The standard deviation, denoted σ, is the positive square root of the
variance and represented by the Greek letter sigma. It shows the
variation in data. If the data is close together, the standard deviation will
be small and if the data is spread out, the standard deviation will be large.
Since the standard deviation is measured in the same units as the random
variable and the variance is measured in squared units, the standard
deviation is often the preferred measure. The standard deviation is
considered as the most reliable measure of variability. It is affected by the
individual values or items in the distribution. The Standard deviation is
root mean square deviation from mean and it is a measure of spread of a
distribution. Here is the formula for sample and population standard
deviation:
e. Variance
Variance is symbolized as σ2 and it is the sum of squares of differences
between all numbers and means. To find the variance σ2 of a discrete
probability distribution, we need to find each deviation from its expected
value, square it, multiply it by its probability, and add the products. The
variance is the square of the standard deviation or in other words, when
we obtained the value of the standard deviation, we can already
determine the value of the variance. It is only the square root symbol that
makes standard deviation different from variance. Here is the formula for
sample and population variance:
X 0 1 2 3
P ( X =x) 1 3 3 1
or .125 or .375 or .375 or .125
8 8 8 8
1 3 3 1 8
+ + + = =1
8 8 8 8 8
The table below represents the possible values of the random variable X and their
corresponding probabilities:
X 2 3 4 5 6 7 8 9 10 11 12
P ( X =x) 1 2 3 4 5 6 5 4 3 2 1
36 36 36 36 36 36 36 36 36 36 36
1 2 3 4 5 6 5 4 3 2 1 36
+ + + + + + + + + + = =1
36 36 36 36 36 36 36 36 36 36 36 36
The figure above shows the Normal Distribution of the Overall column which represents the
quality of a player in their 1 to 100. This value is a sorted weighted average of the many
individual stats each player has. The graph is bell-shaped and resembles a normal
distribution. The overall value is not entirely discrete but rather an approximation. One of
the main characteristics of a normal distribution is symmetry and the overall values are
symmetrically distributed thus we can safely consider the game balance and acceptable for
competitive way.
The figure above shows the histogram of first 30 players in the data set based on their ID
number. The graph is a Students T distribution because it is symmetric and it shows that it is
a balance game.
The figure about shows the Poisson Distribution of the player’s age. The age is a discrete
variable and it start at age 16 so we considered it as the staring point or origin for Poisson
distribution. Each bar in the graph showcase the likelihood of a certain player within the
data to be a specific age. Since the Poisson distribution is skewed, the younger players out
numbered the older ones.
The figure above shows the Binomial Distribution of overall and potential stats. The graph is
bell-shaped and can be considered as binomial distribution.
The figure above shows the scatterplot of the daily views. Most of the views occur withing
the first few days. The graph starts off at a very high point and drops down rather quickly.
We can see the daily views starts around 100,000 views but fall to about 20,000 views with
a week. Once the new videos are released and promoted, viewership drops down to around
10,000 per day and steadily decreases as it loses relevancy.
The figure above shows the scatterplot of the total views. The total views represent the
cumulative number of views up to a given period of time. It shows the aggregated number
of views the video got. The curve goes up at a decreasing rate before eventually plateauing,
this also match the CDF of the exponential distribution.
The figure above shows the scatterplot of the membership status. If the person is a
premium member, it is 1 value and if not, it is 0. Most of people under the age of 34 don’t
have the membership, while most of the people over the age of 34 do. The data follows the
logistic distribution, since the likelihood of having a membership sharply rises after nearing a
specific value.
3. Watch the following video lectures and write down and information that
you find useful about the application of probability to the fields of
finance, statistics and data science.
I’ve learned that finance often predict values and prices of uncertain future
events. One example of an event is option pricing which represents how
much we are willing to pay for us to receive the pact or what the highest
premium we would agree. It also allows one of the sides to decide whether to
go on a deal at a later date. I also learned that a one parties must pay a
compensation called premium. Whoever pays the premium gets to decide if
the deals are going to get through when predetermine point in a future
arrived. For example, you need to pay premium of $100 to investor for the
option to buy 10 stocks of Google at $1,100 a piece one week from today.
There is 40% chance that the chance will increase to $1,200, and at 60%
chance it is going to fall to $1,000. In this case, the prices will either rises or
drops. $300 is the expected value in this case since it is greater than 0, this
deal is favorable and we should buy this option. I’ve also learned about the
decision tree which describes the different possible pay offs we could get and
their associated probabilities of occurring. If the expected value is negative,
the deal is advantageous because you will be losing money. If the expected
value is 0 then it is known as “fair deal” taking or not taking the deal. And if
the payoff is positive, the rational move will be to follow through with the
deal you expect to make a profit. The investor can charge a higher premium
to make a “fair deal”. So, the use of probability is to determine whether
investing opportunity is worth their money. The likely or unlikely certain
events helps business man make correct calls. The probability really plays a
role in finance because many businesses apply the understanding of
uncertainty and probability in their business decision practices.
Probability models can greatly help businesses in optimizing their policies and
making safe decisions. Though complex, these probability methods can
increase the profitability and success of a business.
I’ve learned that statistics constructs the pillars on which data science is built.
The more general the issues the more we rely on the simpler concepts on
probability and the more concrete our interest are, the more we need to rely
on data science. The Data Analyst, Data Scientist a Data Engineer should have
a good understanding about statistics and probability. In Data Analysis, it
usually analyses past data, find insight and make reasonable predictions about
the future. Another thing that I’ve learned is the “Monte Carlo” Simulation
which generate artificial data to test the predictive power of the
mathematical models. The data are usually not completely random but it
follows certain restrictions. Most machine learning is extremely a fast -paced
trial-and-error process. The more prediction it makes, the more precise they
become. The future is uncertain so data scientist often tries to predict what
will happen based on the information they have about, the past and present
data. The Machine Learning and Deep Learning have very high predictive
powers but it is still not 100% certain. There are unpredictable events that
can occur in real life like earthquakes, volcanic eruptions or sudden scientific
breakthroughs that can completely change the anticipated course of events.
Lastly, I’ve learned about the Data Science which is an expansion of
probability, statistics and programming that implements computational
technology to solve more advanced questions and data science relies on
expected values. Learning of probability helps us in making informed
decisions about likelihood of events, based on a pattern of collected data. In
the context of data science, statistical inferences are often used to analyze or
predict trends from data, and these inferences use probability distributions
of data.
V. References
1. Udemy. 2020. “Complete Data Science Training: Mathematics, Statistics, Python,
Advanced Statistics in Python, Machine & Deep Learning”. Retrieved from:
https://www.udemy.com/course/the-data-science-course-complete-data-
science-bootcamp/learn/lecture/
2. Hayes Andy, Dyer Jason, and Ross Eli. nd. “Probability”. Retrieved from:
https://brilliant.org/wiki/probability/
3. Hayes, Adams. 2020. “Probability Distribution”. Retrieved from:
https://www.investopedia.com/terms/p/probabilitydistribution.asp
4. NIST.gov.nd. “Engineering Statistics Handbook”. Retrieved from:
https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm
5. Mathisfun.com.nd.”Using and Handling Data”. Retrieved from:
https://www.mathsisfun.com/data/index.html