0% found this document useful (0 votes)
121 views42 pages

Mark Anthony Legaspi - Module 3 - Probability

This document outlines a module on probability for a data science course. It introduces probability and explains that it represents the likelihood of an uncertain event. The module objectives are to learn about combinations, permutations, variations, Bayes' law, distributions, and applying probability in fields like finance. Key concepts covered include combinations, permutations, variations, sets, events, conditional probability, and more. Students are directed to watch video lectures explaining these probability topics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views42 pages

Mark Anthony Legaspi - Module 3 - Probability

This document outlines a module on probability for a data science course. It introduces probability and explains that it represents the likelihood of an uncertain event. The module objectives are to learn about combinations, permutations, variations, Bayes' law, distributions, and applying probability in fields like finance. Key concepts covered include combinations, permutations, variations, sets, events, conditional probability, and more. Students are directed to watch video lectures explaining these probability topics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Republic of the Philippines

City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Data Science
Module 3 - Probability

Name (LN,FN,MN): Legaspi, Mark Anthony A. Program/Yr/Block: BSIT-3A

I. Introducti on
This module is about probability, a very important concept that you need to
understand before delving into data science. Probability is defined as a
number that represents the likelihood of an uncertain event. Understanding
and modeling probabilities is a crucial component of data science (and
machine learning).

It is expected that after completing this module, you will have the sufficient
knowledge about probabilities that you will need in data science. This is true
if you will be dealing with predictions based on your data set and that you
need to understand the uncertainty associated with your predictions.

Please feel free to use other resources that you might find on the Internet in
order for you to have numerous examples of the different concepts that will
be introduced in this module.

II. Learning Objecti ves


After completing this module, you should have working knowledge about the
following concepts:
1. Be able to perform computations regarding Combinatorics:
Permutations, Variations and Combinations.
2. Be able to interpret the relationships between possible outcomes
of various events using Bayes’ Law.
3. Be able to determine the kind of distribution a data set follows
that is crucial in making accurate predictions about the future.
4. Be able to determine continuous distributions that includes
understanding common examples, applications and their formulas
5. Be able to know how probability is applied in the fields of finance,
statistics and data science.

III. Topics and Key Concepts


Please watch Video Lectures 23-26 about introduction to probability

Please watch Video Lecture 27: “Fundamentals of Combinatorics”

A. Permutations
Permutations represent the number of different possible ways we can
arrange a number of elements.

Prepared by: Mr. Arnie Armada


1
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Characteristics of Permutations:
• Arranging all elements within the sample space.
• No repetition.
• 𝑃 𝑛 = 𝑛 × 𝑛 − 1 × 𝑛 − 2 × ⋯ × 1 = 𝑛! (Called “n factorial”)

Example:
• If we need to arrange 5 people, we would have P(5) = 120 ways of doing so.

Please watch Video Lecture: “Permutations and How to Use Them”.

Factorials express the product of all integers from 1 to n and we denote them
with the “!” symbol.
𝑛! = 𝑛 × 𝑛 − 1 × 𝑛 − 2 × ⋯ × 1

Key Values:
• 0! = 1.
• If n<0, n! does not exist.

Please watch Video Lecture: “Simple Operations with Factorial”.

B. Variations
Variations represent the number of different possible ways we can pick
and arrange a number of elements.

Prepared by: Mr. Arnie Armada


2
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Please watch video lecture: “Solving Variations with Repetition”.


Please watch video lecture: “Solving Variations without Repetition”.

C. Combinations
Combinations represent the number of different possible ways we can
pick a number of elements.

Characteristics of Combinations with separate sample spaces:


• The option we choose for any element does not affect the number of
options for the other elements.
• The order in which we pick the individual elements is arbitrary.
• We need to know the size of the sample space for each individual
element. (𝑛1, 𝑛2 … 𝑛𝑝)

Please watch video lecture: “Solving Combinations”

Example: Winning the Lottery

To win the lottery, you need to satisfy two distinct independent events:
• Correctly guess the “Powerball” number. (From 1 to 26)
• Correctly guess the 5 regular numbers. (From 1 to 69)

Prepared by: Mr. Arnie Armada


3
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Symmetry of Combinations

Let’s see the algebraic proof of the notion that selecting p-many elements out of a
set of n is the same as omitting n-p many elements.

For starters, recall the combination formula:

If we plug in 𝑛 − 𝑝 for p, we get the following:

Therefore, we can conclude that 𝑪 𝒏, 𝒑 = 𝑪(𝒏, 𝒏 − 𝒑).

Please watch video lecture: “Symmetry of Combinations”

Please watch video lecture 34-37 about combinatorics.

D. Bayesian Notation
A set is a collection of elements, which hold certain values. Additionally,
every event has a set of outcomes that satisfy it.
The null-set (or empty set), denoted ∅, is a set which contain no values.

Please watch video lecture: “Sets and Events”.

Prepared by: Mr. Arnie Armada


4
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

E. Multiple Events
The sets of outcomes that satisfy two events A and B can interact in one
of the following 3 ways.

Please watch video lecture: “Ways Sets Can Interact”.

F. Intersection
The intersection of two or more events expresses the set of outcomes
that satisfy all the events simultaneously. Graphically, this is the area
where the sets intersect.

Please watch video lecture: “Intersection of Sets”.


G. Union
The union of two or more events expresses the set of outcomes that
satisfy at least one of the events.
Graphically, this is the area that includes both sets.

Please watch video lecture: “Union of Sets”.

Prepared by: Mr. Arnie Armada


5
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

H. Mutually Exclusive Sets


Sets with no overlapping elements are called mutually exclusive.
Graphically, their circles never touch.

Remember:
All complements are mutually exclusive, but not all mutually exclusive
sets are complements.

Example:
Dogs and Cats are mutually exclusive sets, since no species is
simultaneously a feline and a canine, but the two are not complements,
since there exist other types of animals as well.

Please watch video lecture: “Mutually Exclusive Sets”.

I. Independent and Dependent Events


If the likelihood of event A occurring (P(A)) is affected event B occurring,
then we say that A and B are dependent events. Alternatively, if it isn’t –
the two events are independent.
We express the probability of event A occurring, given event B has
occurred the following way 𝑷(𝑨 | 𝑩) . We call this the conditional
probability.

Please watch video lecture: “Dependence and Independence of Sets”.

J. Conditional Probability

Prepared by: Mr. Arnie Armada


6
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

For any two events A and B, such that the likelihood of B occurring is
greater than 0 (𝑃 𝐵 > 0), the conditional probability formula states the
following:

Please watch video lecture: “The Conditional Probability Formula”.

K. Law of Total Probability


The law of total probability dictates that for any set A, which is a union of
many mutually exclusive sets 𝐵1,𝐵2, … , 𝐵𝑛, its probability equals the
following sum:

Please watch video lecture: “The Law of Total Probability”.

L. Additive Law
The additive law calculates the probability of the union based on the
probability of the individual sets it accounts for.

Prepared by: Mr. Arnie Armada


7
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Please watch video lecture: “The Additive Rule”.

M. The Multiplication Rule


The multiplication rule calculates the probability of the intersection based
on the conditional probability.

Please watch video lecture: “The Multiplication Law”.

N. Bayes’ Law
Bayes’ Law helps us understand the relationship between two events by
computing the different conditional probabilities.
We also call it Bayes’ Rule or Bayes’ Theorem.

Prepared by: Mr. Arnie Armada


8
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Please watch video lecture: “Bayes’ Law”.

O. Probability Distribution
A probability distribution is a statistical function that describes all the
possible values and likelihoods that a random variable can take within a
given range. This range will be bounded between the minimum and
maximum possible values, but precisely where the possible value is likely
to be plotted on the probability distribution depends on a number of
factors. These factors include the distribution's mean (average), standard
deviation, skewness, and kurtosis.
Perhaps the most common probability distribution is the normal
distribution, or "bell curve," although several distributions exist that are
commonly used. Typically, the data generating process of some
phenomenon will dictate its probability distribution. This process is called
the probability density function.

Prepared by: Mr. Arnie Armada


9
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

As a simple example of a probability distribution, let us look at the


number observed when rolling two standard six-sided dice. Each die has a
1/6 probability of rolling any single number, one through six, but the sum
of two dice will form the probability distribution depicted in the image
below. Seven is the most common outcome (1+6, 6+1, 5+2, 2+5, 3+4,
4+3). Two and twelve, on the other hand, are far less likely (1+1 and 6+6).

Population vs. Sample

Please watch video lecture: “Fundamentals of Probability Distributions”.

P. Types of Probability Distribution


Certain distributions share characteristics, so we separate them into
types. The well-defined types of distributions we often deal with have
elegant statistics. We distinguish between two big types of distributions
based on the type of the possible values for the variable – discrete and
continuous.

Please watch video lecture: “Types of Probability Distribution”.

a. Discrete Distribution
Discrete Distributions have finitely many different possible
outcomes. They possess several key characteristics which separate
them from continuous ones.

Prepared by: Mr. Arnie Armada


10
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Please watch video lecture: “Characteristics of Discrete


Distributions”

i. Uniform Distribution
A distribution where all the outcomes are equally likely is
called a Uniform Distribution.

Please watch video lecture: “The Uniform Distribution”.

ii. Bernoulli Distribution


A distribution consisting of a single trial and only two
possible outcomes – success or failure is called a
Bernoulli Distribution.

Prepared by: Mr. Arnie Armada


11
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Please watch video lecture: “The Bernoulli Distribution”.

iii. Binomial Distribution


The binomial distribution is used when there are exactly
two mutually exclusive outcomes of a trial. These
outcomes are appropriately labeled "success" and
"failure". The binomial distribution is used to obtain the
probability of observing x successes in N trials, with the
probability of success on a single trial denoted by p. The
binomial distribution assumes that p is fixed for all trials.

For example, a coin toss has only two possible outcomes:


heads or tails and taking a test could have two possible
outcomes: pass or fail.

Binomial distributions must also meet the following three


criteria:

 The number of observations or trials is fixed. In


other words, you can only figure out the probability
of something happening if you do it a certain
number of times. This is common sense—if you toss
a coin once, your probability of getting a tail is 50%.
If you toss a coin 20 times, your probability of
getting a tail is very, very close to 100%.
 Each observation or trial is independent. In other
words, none of your trials have an effect on the
probability of the next trial.
 The probability of success (tails, heads, fail or pass)
is exactly the same from one trial to another.

Prepared by: Mr. Arnie Armada


12
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Please watch video lecture: “The Binomial Distribution”.

iv. Poisson Distribution


When we want to know the likelihood of a certain event
occurring over a given interval of time or distance
we use a Poisson Distribution.

Please watch video lecture: “The Poisson Distribution”.

b. Continuous Distribution
If the possible values a random variable can take are a sequence
of infinitely many consecutive values, we are dealing with a
continuous distribution.

Prepared by: Mr. Arnie Armada


13
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Please watch video lecture: “Characteristics of Continuous


Distributions”
i. Normal Distribution
A Normal Distribution represents a distribution that most
natural events follow.

Please watch video lecture: “The Normal Distribution”.

ii. Standardizing a Normal Distribution


To standardize any normal distribution, we need to
transform it so that the mean is 0 and the variance and
standard deviation is 1.

Prepared by: Mr. Arnie Armada


14
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Please watch video lecture: “The Standard Normal Distribution”.

iii. Students’ T Distribution


A Students’ T Distribution represents a small sample size
approximation of a Normal Distribution.

Please watch video lecture: “The Students’ T Distribution”.

iv. The Chi-Squared Distribution


A Chi-Squared distribution is often used.

Please watch video lecture: “The Chi-Squared Distribution”.


v. Exponential Distribution
The Exponential Distribution is usually observed in events
which significantly change early on.

Prepared by: Mr. Arnie Armada


15
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Please watch video lecture: “The Exponential Distribution”.

vi. The Logistic Distribution


The Continuous Logistic Distribution is observed when
trying to determine how continuous variable inputs
can affect the probability of a binary outcome.

Please watch video lecture: “The Logistic Distribution”.

IV. Learning Tasks


A. Engage
1. Combinatorics Exercise (60 points)
For the following set of problems determine what part of Combinatorics
we need to use and apply the appropriate formula. Have in mind that
there might be more than one correct approach to some (or all) of these

Prepared by: Mr. Arnie Armada


16
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

questions.

a. Imagine you are working at an office and have 5 tasks labelled as


“Critical” in Jira to complete by the end of the day. In how many ways
can you complete said tasks before the day ends?

(** “Jira” is a Project Management software which allows you to


create tasks and label them depending on their importance. “Critical”
is the highest level of importance and no task with a lower-level can
be started once such a task is initiated.)

Answer:
First, all of the 5 tasks need to be arranged. In this problem we are
looking for the number of Permutations between 5 elements. The
permutation would be 5! and it will look like this 5 * 4 * 3 * 2* 1
which is equal to 120 ways of completing my tasks.

b. Imagine your company is trying to gain customers by running an


online ad campaign. The idea is to focus on a certain demographic
which frequently uses social media. Your campaign will run ads on
Facebook, Messenger, Instagram, Twitter and Reddit. Your graphical
designers have created 8 different versions of the banner you can use.
Based on this information:

a) Calculate how many different options you have for the entire
campaign, assuming you want to use a different one for each
platform.
b) Calculate how many different options you have for the entire
campaign, assuming you can use the same banner for some or all the
platforms.
c) Calculate how many ways we can pick which of the 8 banners to
use, assuming we use different ones for each platform.
d) Calculate how many ways we can pick which of the 8 banners to
use, assuming we can use each one multiple times.

Answers:

Now, we have 8 banners at our disposal, and we need to put them on


5 platforms.

a) We will use different banners for each platform and we can think
of each social media platform as a different position. In this case, we
will use the Variations because we are using different banners for
each platform. There should be no repeated values so the formula is
this:

Prepared by: Mr. Arnie Armada


17
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

n! 8!
V= = =4 × 5 ×6 ×7 × 8=6,720
( n− p ) ! 3!

b) It is same as the first one but this time we can repeat the values, so
it would be Variations with repetition, the formula is this:

V́ =n p=8 5=32,768

c) We will select 5 banners out of 8 to use. We will use different


banners for each social media platform, so there should be no
repetition. In this case, we will use the formula of Combinations
without repetition. The formula and answer are this:

n! 8! 6 × 7 ×8
C= = = =7 × 8=56.
p ! (n− p)! 5 ! 3! 6

d) We will select 5 banners out of 8 again but this time we can use it
multiple times. So, we need to use the Combinations with repetition.
The formula and answer are this:

(n+ p−1)! 12! 8 ×9 ×10 × 11×12 (8 ×9 × 11)×10 × 12


Ć= = = = =8× 9 ×11=792 .
p ! (n−1) ! 5 ! 7 ! 1 ×2 ×3 × 4 ×5 (2× 5)×(3 × 4)

In this case, it is important to know how many times we will use each
banner and also to know which banner we will use, so we can assign them
accordingly. If we didn’t care how many times, we use the banners we
have selected, then we need to find the sum of C 85+ C84 + C83 +C 82+ C81 . This is
because we are estimating the number of ways, we can select the
banners, assuming we are using 5 different ones, 4 different ones, 3
different ones and so on.

c. You are renovating your entire apartment and want to repaint the
walls of each room. The flat consists of two bedrooms, a kitchen, a
living room, a bathroom, a study and a hall, or 7 rooms in total. You
have at your disposal several colors of paint: white, yellow, orange,
red, purple, blue, green, grey and pink.

How many different ways can you paint the house, assuming…?

a) …you paint all the rooms in different colors?

Prepared by: Mr. Arnie Armada


18
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

b) …you paint the bathroom, study and hall in white?


c) …you paint the two bedrooms in identical color?
d) …you can only use grey and yellow?

Answers:
a) In this case, we are dealing with Variations because we will use
different colors for each room in the apartment. So, we cannot
repeat the values, and the formula and answer are this:

n! 9!
V= = =9 × 8× 7 ×6 ×5 × 4 × 3=181,440.
( n− p)! 2!

b) If we will paint the bathroom, study and hall in white, we will just
need to think about the other 4 rooms. Now, this problem can be
interpreted several different ways, so let us examine each outcome:

a. If we paint the other 4 rooms in 4 different colors. We will use the


Variations without repetition, so this is the formula:
n! 9!
V= = =6 ×7 × 8 ×9=3,024
( n− p)! 5!

b. If we paint the other 4 rooms in 4 different colors. That means


we have Variations without repetition. Additionally, we have
already used white, so we are down to only 8 colors. Thus,

n! 8!
V= = =5× 6 ×7 × 8=1,680.
( n− p)! 4 !

c. We can use the color we want because there is no restriction in


the remaining 4 rooms. Therefore, we have Variations with
repetition, and we can use “white”. Thus,

V́ =n p=94=6,561.

We phrased the question with the idea of going for interpretation “b”,
but we see merit in the other approaches as well.

c) If we paint the two bedrooms in the same color, we can think of


them as a single big room. Thus, the number of rooms becomes 6
instead of 7.

a. There is no restriction on whether we can repeat any of the


colors we use, so we have variations with repetition once
more V́ =n p=96=531,441.

Prepared by: Mr. Arnie Armada


19
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

b. Alternatively, it is not clearly stated we can repeat values, so


let us examine the alternative. If we cannot repeat values, we
have variations without repetition, so the formula we use
becomes:
n! 9!
V= = =4 ×5 ×6 × 7× 8 ×9=60,480.
( n− p)! 3!

d) Using only grey and yellow means we have 2 colors to choose


from, so 𝑛 = 2. Additionally, to paint all the rooms we must repeat
one or both colors. Therefore, these are Variations with
repetition, so the formula is the following:
V́ =n p=27 =128

d. This year, you are helping organize your college’s career fest.
There are 11 companies which are participating, and you have
just enough room fit all of them. How many ways can you
arrange the various firms, assuming…:

a) … no firm has any preference where they want to be positioned?


b) … JP Morgan representatives made a deal, where they have to be
located exactly in the middle?
c) … JP Morgan, Citi Bank and Morgan-Stanley must be positioned in
the middle 3 spots?
d) … Deutsche Bank representatives cancel, so you can give the
additional space to one of the other companies?

Answers:

We have 11 firms and 11 spots where we can place each one.

a) If no look has any preference, then we need to arrange the entire


set of 11 firms across the room. Thus, we need to use
Permutations, so:
Pn=11 !=39,916,800.

b) If JP Morgan has to be located in the middle, then we only need


to arrange the remaining 10 firms around the room. Thus, we can
once again use permutation, but this time 𝑛 = 10. Thus,
Pn=10 !=3,628,800.

c) One approach to this problem is by looking at two distinct groups


of firms – JP Morgan, Citi Bank and MS as one group, and the
other 8 firms as the second group. Then if we find the number of

Prepared by: Mr. Arnie Armada


20
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

ways, we can set up each group around the room, we just have
two events with distinct sample spaces.

Let’s start with arranging the 3 banks in the middle. Since we need
to split the 3 middle spots among the 3 banks all we need to do is
compute the number of Permutations among 3 elements.
Therefore, Pn=3 !=6.

Now, since none of the remaining 8 firms cares too much where
they are positioned, we once again need to arrange them around
the room. Since we have 8 firms and 8 positions, we once again
rely of permutations, so Pn = 8! = 40,320.

For any of the 40,320 ways we set the 8 firms around the room,
we have 6 different ways to arrange the 3 banks in the middle.
Therefore, in total, we have 40,320 × 6 = 241,920 ways of setting
up the career fest.

d) We have 10 firms, which need to fill out 11 spots. Then, if we


start filling up the room in some specific order, then there are
going to be 10 options for who gets the first position. Since any
firm can be given the additional space provided by DB’s
withdrawal, then there are once again 10 options for the second
spot. Then, there would be 9 different options for the third and so
on. This results in having 10 × 10 × 9 × 8…× 1 = 10 × 10! =
36,288,000 many options to arrange the firms.

e. Your best friend is organizing a birthday party for her twins –


Amy and Steve - and she put you in charge of ordering the
cakes. The bakery offers several types of cakes – a Cheesecake,
Sacher Cake, a Chiffon Cake, a Coconut Cake and a Carrot
Cake. How many different ways can you order the cakes,
assuming that…?

a) … both twins enjoy all the 5 types of cake?


b) … Steve dislikes Coconuts?
c) … Amy loves chocolate (Sacher)?
d) … each cake comes with a generic “Happy Birthday!” wish?
e) … each cake comes with a personalized “Happy Birthday Steve!” or
“Happy Birthday
Amy!” sign?

Answer:

a) Now, if both twins enjoy all 5 cakes, then need to find the number
of different combinations of picking 2 cakes out of these 5. Since we

Prepared by: Mr. Arnie Armada


21
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

are not explicitly told whether we could get the same cake for both,
we should consider both scenarios.

a. Assuming we wish to get the different cakes, then we use the


formula for Combinations without repetition:
n! 5!
C= = =10.
p ! (n− p)! 2! (3)!

b. Assuming, we can get them identical cakes, that means we have 5


more options – Cheese and Cheese, Sacher and Sacher and so on.
Therefore, we have 15 different ways of getting these cakes.
Additionally, we can use the formula for variations with repetition
(n+ p−1) ! 6 ! 6 ×5
to get: Ć= = =15.
p ! (n−1)! 2 ! 4 ! 1× 2

b) Since Steve dislikes coconuts, our options are limited to 4 cakes.


Then, we need to choose two of the 4 remaining cakes, so
n! 4!
C= = =6. If we can get two identical cakes, then we
p ! (n− p) ! 2! 2!

(n+ p−1)! 5!
have Ć= = =10 options.
p ! (n−1) ! 2 ! 3!

(Alternatively, we can get one Coconut cake and 1 other cake. That
way Steve will still have something else to eat. In that scenario, if we
can have two identical cakes, then the only option which we want to
avoid is the double Coconut one. Thus, we take the 15 we got in part
b of a), and subtract 1, so we get 14 options.

Now, if we want to have 2 different cakes, we need to remove the


double Cheesecake, double Sacher, double Chiffon and double Carrot
cake options. Therefore, there would be 10 different orders we could
make.)

c) If Amy loves chocolate, one of the two cakes must be Sacher. Then,
we only need to think about what the other one is. Since we have 5
different cakes, then we have 5 options for choosing the cakes.

d) Now, if the cakes come with generic “Happy Birthday” wishes, it


does not matter who gets each cake. Then, since we are not given any
additional indication of preference, we can assume this is equivalent
to part a). Thus, there are 15 different orders we can make.

e) Now, since it is vital to select the appropriate wish on each of the


two cakes, this means that we are dealing with variations. Once again,

Prepared by: Mr. Arnie Armada


22
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

we have two approaches depending on whether we wish to get them


different cakes.

n! 5!
a. If we decide to do so, then we have V = = =4 × 5=20
( n− p)! 3!
different orders.
b. Now, if we are allowed to get them identical cakes, then we have
variations with repetition. Thus, V́ =5 2=25.

f. You want to go to the gym between lectures every day, but


you only have an hour to workout. Knowing this, you decide to
do a circuit workout. Your start with 5 minutes of cardio as a
warm-up, then you hit two different leg exercises, followed by
a chest exercise, as well as one for shoulders. After that, you
continue with a bicep exercise and a triceps one, before
moving to the back one. You finish the circuit with 2
abdominal exercises like a plank and some crunches. After
completing the circuit several times, you end with another 10
minutes of cardio before you stretch and leave.
Now, assuming the gym has ellipticals, treadmills and stationary bikes,
you have 3 options for the cardio. Additionally, you have 5 different
leg exercises you can do. You have 4 choices of what to do for each of
the next 3 muscle groups (chest, shoulders and bicep). For triceps you
have heavy preferences towards two specific exercises, so you always
do one of the two. The same can be said about the back. When it
comes to the abdominal exercises, you have 4 options once again.
Taking everything into consideration, if you want to do a different
workout each day, how long will it take you to run out of options?

Answers:

To begin with, notice that this entire exercise is just an example of


Combinations of events with separate sample spaces. Our best
approach would be to go through the workout regime one exercise at
a time and determine the size of the sample space at each instance.
__ __ __ __ __ __ __ __ __ __ __
Start with the warm-up cardio. We have 3 options, so we fill out the
first position.

3 __ __ __ __ __ __ __ __

Next, we go over the leg exercises. We want to do 2 different


exercises and we care which one we do first. Thus, we have
n! 5!
V= = =4 × 5=20 options for the legs.
( n− p)! 3!

Prepared by: Mr. Arnie Armada


23
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

3 20 __ __ __ __ __ __ __

Then we have 4 alternatives for each of the next 3 groups – chest,


shoulders and biceps.
3 20 4 4 4 __ __ __ __

For triceps and back we have two options each, so we add those as
well.
3 20 4 4 4 2 2 __ __

When it comes to the abdominal exercises, we take the same


approach we did with the leg exercises. Thus, we have
n! 4!
V= = =3× 4=12options for abs.
( n− p)! 2 !

3 20 4 4 4 2 2 12 __

Our warm-down consists of additional cardio, for which we have 3


options.
3 20 4 4 4 2 2 12 3

Now, to solve this, we need to multiply the sizes of the different


sample spaces we just defined. Thus, we find the product 3 × 20 × 4 ×
4 × 4 × 2 × 2 × 12 × 3. This results in 552, 960 different variations of
the same circuit workout. Therefore, realistically, you will never run
out of options.

(Please ask your instructor for the download link of the solution file to
check your answers)

2. Bayesian Inference (40 points)

Please watch video lecture: “A Practical Example of Bayesian Inference”.

Here are the questions we left as homework towards the end of the
video:

1) What is the likelihood for a male student to be accepted?


Answer:

The likelihood for a male student to be accepted is equals the number


of admitted male students over the number of male students, who
applied, so it would be just 634 / 2590, or approximately 24.48%.

2) What is the likelihood for a female student to be accepted?

Prepared by: Mr. Arnie Armada


24
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Answer:

The likelihood for a female student to be accepted is same as the


likelihood of the male student so it would be 741 / 3088, or
approximately 24%.

3) Does gender have an effect on your chances of being accepter?


Answer:

I think that there is slightly more competitive to get accepted if you


are a woman, but the two likelihoods are almost the same or
relatively even.

4) Are first time freshmen men or women more likely to enroll?


Answer:

The likelihood of first-time freshmen men enrolling equals the


number of men who enrolled, over the number of men, who were
accepted, or 217 / 634 or approximately 34.23%. Once again, we find
the associated likelihood for women in a similar way, 263 / 741, or
close to 35.49%. The higher enrollment rate makes sense, given the
lower acceptance rate among female applicants.

Onto table C2.

1) What is the likelihood of being offered a place on the waitlist?

Answer:
We can interpret the likelihood of getting a place on the waiting list
two different ways and each is equally correct, given we clearly define
our understanding of the problem. If we assume that we want the
probability of getting on the waiting list, upon applying to Hamilton,
then the probability would equal the number of students on the
waiting list, over the total number of students who applied. From
table C1 we know that we had 2590 male and 3088 female applicants,
or 5678 total candidates that year. Since 1299 of them landed on the
waiting list, then the likelihood was: 1299 / 5678, or close to 22.88%.

Alternatively, we might want to calculate the probability of landing on


the waiting list after not getting accepted. In this case, our sample size
decrease from 5678, to 4303 after we take away the 634 male and
741 female accepted students. Then, the likelihood of getting a spot
on the waiting list becomes 1299 / 4304 or roughly 30.19%.

2) What is the likelihood of getting admitted, having accepted a place on


the waitlist?

Prepared by: Mr. Arnie Armada


25
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Answer:
We know that 629 students accepted a place on the waiting list, and
out of those 629, 33 got admitted. Thus, the likelihood of getting
admitted, given a student accepted a place on the waiting list, equals
33 / 629, or 5.25%.

3) What is the likelihood of getting admitted, having been offered a place


on the waitlist?

Answer:
This question might seem the exact same as the one before, but this
time we are asking for the likelihood of being admitted, given the
student was offered a place on the waiting list. This means our sample
space is not just the 629 students who accepted the place in the
waiting list, but the entire 1299, who were offered one. Thus, the
likelihood equals 33 / 1299, or roughly 2.54%.

(Please ask your instructor for the download link of the solution file to
check your answers)

B. Explain (50 points)

1. Using your own words, differentiate the following terms from each other.
Permutation Combination
Permutation is an arrangement of things Combination is grouping/selection of things
where order of arrangement matters. It is where order does not matter. It is denoted by
denoted by nPr and its formula is npr=n!/(n- nCr and its formula is ncr=n!/r!*(n-r!). Only one
r!). The permutation can be associated with combination can be derived with one
position and many permutations can be permutation. Combination indicates different
derived from a single combination. The ways of selecting menu items, food, clothes,
Permutation denotes several ways to arrange subjects, etc. Just like the permutation there
things, people, digits, alphabets, colors, etc. are also two types of combination, the
There are basically two types of permutation, repetition is allowed and no repetition. An
the repetition is allowed and no repetition. example for the repetition is allowed is the
An example of repetition is allowed is the coins in our pocket (1, 1, 5, 5, 5). And the
digit combination lock because it could 222 example for no repetition is the lottery
passwords and the digits can be repeated. numbers (3, 5, 7, 23, 12, 16, 20, 18).
And the example for no repetition is the first Combination answers How many different
three people in a running race because we groups can be picked from a larger group of
can’t be first and second. Permutation objects? The combination implies unordered
answers How many different arrangements set or pairing of values within specific criteria.
can be created from a given set of objects?
The permutation is nothing but an unordered
combination.
Population Sample Space

Prepared by: Mr. Arnie Armada


26
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

A population may refer to an entire group of A sample space is used a lot in the sciences and
people, objects, events, restaurants, place, or in mathematics. A sample space is usually
measurements. A population can thus be said denoted by the letter S and it is the set of all
to be an aggregate observation of subjects possible outcomes in the experiment. Each
grouped together by a common feature. It outcome in a sample space is called a sample
includes all the elements from the data set and point. It is also called an element or a member
of the sample space. Sample space can be
measurable characteristics of the population
written using the set notation, { } and the
such as mean and standard deviation are
possible ordered outcomes are listed
known as a parameter. Example of population as elements in the set. It is common to refer to a
are the voting intentions of all voters in sample space by the labels S, Ω, or U (for
Philippines, all sales receipts for October. A "universal set"). The elements of a sample
population can be vague or specific. space may be numbers, words, letters, or
Examples of population include the number symbols. They can also be finite, countably
of newborn babies in Singapore, total number infinite, or uncountably infinite. The possible
of tech startups in Europe, average height of outcomes must be mutually exclusive and
all PBA players in the Philippines, mean exhaustive. Mutually exclusive means they are
weight of U.S. taxpayers, voting intentions of distinct and non-overlapping and the exhaustive
all voters in Philippines, all sales receipts for means complete. When determining a sample
space, we must be careful to include all
October and so on. Populations can be the
possibilities and this may become a difficult
complete set of all similar items that exist, it
task when the sample space becomes very large.
can be a theoretical construct that is A sample space S is either discrete or
potentially infinite in size and it share a set of continuous. The example of sample space is
attributes that we define. There are different Tossing a die. The Possible outcomes after
types of population, they are Finite tossing a die are the numbers 1, 2, 3, 4, 5, and 6.
Population, Infinite Population, Existent So the sample space would be, S = {1, 2, 3, 4, 5,
Population and Hypothetical Population. 6}.
Discrete Probability Distribution Continuous Probability Distribution
A discrete distribution is one in which the A continuous distribution describes the
data can only take on certain values, for probabilities of the possible values of a
example integers. A discrete distribution
continuous random variable. A continuous
describes the probability of occurrence of
each value of a discrete random variable. A random variable is a random variable with a set
discrete random variable is a random variable of possible values (known as the range) that is
that has countable values, such as a list of infinite and uncountable. A continuous
non-negative integers. In a discrete distribution is appropriate when the variable
distribution, probabilities can be assigned to can take on an infinite number of values.
the values in the distribution - for example, Continuous distributions cannot be written so
"the probability that the web page will have
neatly compared to the uniform discrete
12 clicks in an hour is 0.15." There are several
specialized discrete probability distributions distribution. Probabilities of continuous
that are useful for specific applications. For random variables (X) are defined as the area
business applications, three frequently used under the curve of its PDF. Thus, only ranges of
discrete distributions are Binomial, values can have a nonzero probability. The
Geometric, and Poisson. With a discrete probability that a continuous random variable
probability distribution, each possible value

Prepared by: Mr. Arnie Armada


27
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

of the discrete random variable can be equals some value is always zero. The
associated with a non-zero probability. Thus, continuous normal distribution can describe
a discrete probability distribution is often
the distribution of weight of adult males. For
presented in tabular form. A discrete
distribution is appropriate when the variable example, we can calculate the probability that
can only take on a fixed number of values. a man weighs between 160 and 170 pounds.
For example, if we roll a normal die, we can Many continuous distributions are used for
get 1, 2, 3, 4, 5, or 6. We cannot get 1.2 or business applications. The two of the most
0.1. If it is a fair die, the probability widely used are the Uniform and Normal.
distribution will be 1/6, 1/6, 1/6, 1/6, 1/6, The uniform distribution is useful because it
1/6. Another example, we can use the
represents variables that are evenly distributed
discrete Poisson distribution to describe the
number of customer complaints within a day. over a given interval. And the normal
Suppose the average number of complaints distribution is useful for a wide array of
per day is 10 and we want to know the applications in many disciplines.
probability of receiving 5, 10, and 15
customer complaints in a day.

2. Define each of the following terms using your own words.

a. Probability

Probability means chance of happening or not happening an event. It is


one of the major branches of mathematics but it is the crucial term of
statistics and widely used with advanced statistics. Probability is the
systematic approach to deal with uncertainty. Yet it still only considers
the humanly thinkable possible scenarios or known unknowns. Probability
is measured between the values 0 and 1. If the value is 0, then it is
impossible for the event. And if the value is 1 then it is certain that the
event will happen. Probability is nothing but chance. The chance of the
occurrence of a particular outcome of an uncertain event. For example,
when we toss a coin, we don't have an idea whether we get a head or a
tail, but we can determine the chances of getting a head or tail. There are
various types of probability and probability distributions, and it is widely
used in data science and big data analytics. We predict something in the
basis of probability. We’re not sure of any event and that is why we need
probability which says at some confidence say 95% that what we’re trying
to predict will happen because no one can simply predict future
outcomes at a stake. That is why we need probability.

b. Event

An event can be defined as a set of outcomes of an experiment. Event is


always a subset of the sample space and there is no concept of range and

Prepared by: Mr. Arnie Armada


28
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

domain for an event because it is not function, it is only a set. More over
event is a specific term and random variable is a general term. An event
that is certain to happen has a probability of 1. An event that cannot
possibly happen has a probability of zero. If there is a chance that an
event will happen, then its probability is between zero and 1. Some of the
examples of events are tossing a coin and it landing on heads, rolling a '3'
on a die and guessing a certain number between 000 and 999 (lottery).
There are different types of event in probability the impossible and sure
events, simple events, compound events, independent and dependent
events, mutually exclusive events, exhaustive events, complementary
events, events associated with “OR” and “AND”, and event E1 but not E2.

c. Mean

The mean is the expected value of a random variable and it is denoted


by E(x) or μ which is a weighted average of the values the random
variable may assume. It’s sometimes called a “weighted average” because
more frequent values of X are weighted more highly in the average. It’s
also how we expect X to behave on-average over the long run. When we
know the probability p of every value x we can calculate the expected
value (Mean) of X: μ = Σxp. When we are doing an experiment over the
long term, we would expect an average. There is a sample mean which is
the mean of sample values collected. And population mean which is the
mean of all the values in the population. If the sample is random and
sample size is large then the sample mean would be a good estimate of
the population mean. Expected value is an extremely useful concept for
good decision-making.

d. Standard deviation
The standard deviation, denoted σ, is the positive square root of the
variance and represented by the Greek letter sigma. It shows the
variation in data. If the data is close together, the standard deviation will
be small and if the data is spread out, the standard deviation will be large.
Since the standard deviation is measured in the same units as the random
variable and the variance is measured in squared units, the standard
deviation is often the preferred measure. The standard deviation is
considered as the most reliable measure of variability. It is affected by the
individual values or items in the distribution. The Standard deviation is
root mean square deviation from mean and it is a measure of spread of a
distribution. Here is the formula for sample and population standard
deviation:

Prepared by: Mr. Arnie Armada


29
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

e. Variance
Variance is symbolized as σ2 and it is the sum of squares of differences
between all numbers and means. To find the variance σ2 of a discrete
probability distribution, we need to find each deviation from its expected
value, square it, multiply it by its probability, and add the products. The
variance is the square of the standard deviation or in other words, when
we obtained the value of the standard deviation, we can already
determine the value of the variance. It is only the square root symbol that
makes standard deviation different from variance. Here is the formula for
sample and population variance:

C. Explore (50 points)


1. Construct the discrete probability distribution table for the following
problems:
a. Flip 3 coins at the same time. Let random variable X be the
number of heads showing.
(Hint: Watch the video here)

Outcome of 3 Tosses of a Coin

Outcome No. of heads


HHH 3 heads
HHT 2 heads
HTH 2 heads
HTT 1 head
THH 2 heads
THT 1 head
TTH 1 head
TTT 0 heads

Prepared by: Mr. Arnie Armada


30
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Discrete Probability Distribution of 3 Tosses of a Coin:

X 0 1 2 3
P ( X =x) 1 3 3 1
or .125 or .375 or .375 or .125
8 8 8 8

The sum of all the probabilities in the distribution is equal to 1.

1 3 3 1 8
+ + + = =1
8 8 8 8 8

Histogram of the Probability Distribution of 3 Tosses of a Coin:

b. Construct the discrete probability distribution table for random


variable X which would be the sum of 2 rolled dice.
(Hint: Watch the video here)

Sample space of 2 Rolled Dice

(1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6)


(2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6)
(3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (2, 6)
(4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6)
(5, 1) (5, 2) (5, 3) (5, 4) (5, 5) (5, 6)
(6, 1) (6, 2) (6, 3) (6, 4) (6, 5) (6, 6)

Prepared by: Mr. Arnie Armada


31
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

The table below represents the possible values of the random variable X and their
corresponding probabilities:

Outcome Sum of 2 Dice Probability


{(1, 1)} 2 1/36
{(1, 2), (2, 1)} 3 2/26
{(1, 3), (2, 2), (3, 1)} 4 3/36
{(1, 4), (2, 3), (3, 2), (4, 1)} 5 4/36
{(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} 6 5/36
{(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} 7 6/36
{(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)} 8 5/36
{(3, 6), (4, 5), (5, 4), (6, 3)} 9 4/36
{(4, 6), (5, 5), (6, 4)} 10 3/36
{(5, 6), (6, 5)} 11 2/26
(6,6) 12 1/36

Probability Histogram of 2 Rolled Dice:

Discrete Probability Distribution of 2 Rolled Dice

X 2 3 4 5 6 7 8 9 10 11 12
P ( X =x) 1 2 3 4 5 6 5 4 3 2 1
36 36 36 36 36 36 36 36 36 36 36

The sum of all the probabilities in the distribution is equal to 1.

1 2 3 4 5 6 5 4 3 2 1 36
+ + + + + + + + + + = =1
36 36 36 36 36 36 36 36 36 36 36 36

Prepared by: Mr. Arnie Armada


32
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

2. Watch video lecture: “A Practical Example of Probability Distribution”.


Perform the instructions provided in the video lecture for you to further
understand the following types of distribution using histogram and
scatterplots in MS Excel:
a. Normal Distribution
b. Students’ T Distribution
c. Poisson Distribution
d. Binomial Distribution
e. Exponential Distribution
f. Logistic Distribution

Provide screenshots for each type of distribution that you made.


Download the files mentioned in the video in our Google classroom
class.

Figure 1: Normal Distribution of Overall Stats

The figure above shows the Normal Distribution of the Overall column which represents the
quality of a player in their 1 to 100. This value is a sorted weighted average of the many
individual stats each player has. The graph is bell-shaped and resembles a normal

Prepared by: Mr. Arnie Armada


33
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

distribution. The overall value is not entirely discrete but rather an approximation. One of
the main characteristics of a normal distribution is symmetry and the overall values are
symmetrically distributed thus we can safely consider the game balance and acceptable for
competitive way.

Figure 2: Students T Distribution of Overall First 30 Players

The figure above shows the histogram of first 30 players in the data set based on their ID
number. The graph is a Students T distribution because it is symmetric and it shows that it is
a balance game.

Prepared by: Mr. Arnie Armada


34
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Figure 3: Poisson Distribution of the Player’s Age

The figure about shows the Poisson Distribution of the player’s age. The age is a discrete
variable and it start at age 16 so we considered it as the staring point or origin for Poisson
distribution. Each bar in the graph showcase the likelihood of a certain player within the
data to be a specific age. Since the Poisson distribution is skewed, the younger players out
numbered the older ones.

Prepared by: Mr. Arnie Armada


35
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Figure 7: Binomial Distribution of Overall and Potential Stats

The figure above shows the Binomial Distribution of overall and potential stats. The graph is
bell-shaped and can be considered as binomial distribution.

Prepared by: Mr. Arnie Armada


36
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Figure 4: Exponential Distribution (PDF) of the Daily Views

The figure above shows the scatterplot of the daily views. Most of the views occur withing
the first few days. The graph starts off at a very high point and drops down rather quickly.
We can see the daily views starts around 100,000 views but fall to about 20,000 views with
a week. Once the new videos are released and promoted, viewership drops down to around
10,000 per day and steadily decreases as it loses relevancy.

Prepared by: Mr. Arnie Armada


37
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Figure 5: Exponential Distribution (CDF) of the Total Views

The figure above shows the scatterplot of the total views. The total views represent the
cumulative number of views up to a given period of time. It shows the aggregated number
of views the video got. The curve goes up at a decreasing rate before eventually plateauing,
this also match the CDF of the exponential distribution.

Prepared by: Mr. Arnie Armada


38
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

Figure 6: Logistic Distribution of the Membership Status

The figure above shows the scatterplot of the membership status. If the person is a
premium member, it is 1 value and if not, it is 0. Most of people under the age of 34 don’t
have the membership, while most of the people over the age of 34 do. The data follows the
logistic distribution, since the likelihood of having a membership sharply rises after nearing a
specific value.

3. Watch the following video lectures and write down and information that
you find useful about the application of probability to the fields of
finance, statistics and data science.

a. Video Lecture 65: Probability in Finance


Your reaction:

Prepared by: Mr. Arnie Armada


39
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

I’ve learned that finance often predict values and prices of uncertain future
events. One example of an event is option pricing which represents how
much we are willing to pay for us to receive the pact or what the highest
premium we would agree. It also allows one of the sides to decide whether to
go on a deal at a later date. I also learned that a one parties must pay a
compensation called premium. Whoever pays the premium gets to decide if
the deals are going to get through when predetermine point in a future
arrived. For example, you need to pay premium of $100 to investor for the
option to buy 10 stocks of Google at $1,100 a piece one week from today.
There is 40% chance that the chance will increase to $1,200, and at 60%
chance it is going to fall to $1,000. In this case, the prices will either rises or
drops. $300 is the expected value in this case since it is greater than 0, this
deal is favorable and we should buy this option. I’ve also learned about the
decision tree which describes the different possible pay offs we could get and
their associated probabilities of occurring. If the expected value is negative,
the deal is advantageous because you will be losing money. If the expected
value is 0 then it is known as “fair deal” taking or not taking the deal. And if
the payoff is positive, the rational move will be to follow through with the
deal you expect to make a profit. The investor can charge a higher premium
to make a “fair deal”. So, the use of probability is to determine whether
investing opportunity is worth their money. The likely or unlikely certain
events helps business man make correct calls. The probability really plays a
role in finance because many businesses apply the understanding of
uncertainty and probability in their business decision practices.
Probability models can greatly help businesses in optimizing their policies and
making safe decisions. Though complex, these probability methods can
increase the profitability and success of a business.

b. Video Lecture 66: Probability in Statistics


Your reaction:

I’ve learned that statistics is a sample equivalent of characteristics for a


population data set. An example for a characteristic of the human population
is the record of the eye color of everybody in the entire world and 65% are
brown. The example of statistic is the 1000 people and 60% of them are
brown eye. The field of statistics focuses predominantly on samples and
incomplete data and it often provided a sample data without knowing the
type. Probability lays the ground work for statistics because it defines terms
like mean, variance and expected value. The statistics try to analyze the
numeric and categorical data and see how well it resembles any of the
probability distribution. I’ve also learned about the confidence interval (CI)
which uses sample data to define a range within an associated degree of
certainty and it approximate some margins for the mean of the entire

Prepared by: Mr. Arnie Armada


40
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

population based on small sample. The degree of certainty is usually 90%,


95% or 99% and express the likelihood of the population mean being within
the interval. Another important aspect of statistics is hypothesis testing. Any
distribution predicts a value for all points within the data set. The distribution
anticipates the actual data point. The more distribution we know, the easier
it would be for us to determine which one we are dealing with for a certain
problem. After finding the distribution, we need to create a different model
such as regressions. We need to use computer software to find the
appropriate value because mathematics regression is complex and
computationally expensive. And lastly, I’ve learned about the mathematical
modelling which is an extension of statistics that data scientist deal with. The
statistics is the natural expansion of probability. Probability and statistics are
closely linked because statistical data are frequently analyzed to see whether
conclusions can be drawn legitimately about a particular phenomenon and
also to make predictions about future events.

c. Video Lecture 67: Probability in Data Science


Your reaction:

I’ve learned that statistics constructs the pillars on which data science is built.
The more general the issues the more we rely on the simpler concepts on
probability and the more concrete our interest are, the more we need to rely
on data science. The Data Analyst, Data Scientist a Data Engineer should have
a good understanding about statistics and probability. In Data Analysis, it
usually analyses past data, find insight and make reasonable predictions about
the future. Another thing that I’ve learned is the “Monte Carlo” Simulation
which generate artificial data to test the predictive power of the
mathematical models. The data are usually not completely random but it
follows certain restrictions. Most machine learning is extremely a fast -paced
trial-and-error process. The more prediction it makes, the more precise they
become. The future is uncertain so data scientist often tries to predict what
will happen based on the information they have about, the past and present
data. The Machine Learning and Deep Learning have very high predictive

Prepared by: Mr. Arnie Armada


41
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
www.gordoncollege.edu.ph

powers but it is still not 100% certain. There are unpredictable events that
can occur in real life like earthquakes, volcanic eruptions or sudden scientific
breakthroughs that can completely change the anticipated course of events.
Lastly, I’ve learned about the Data Science which is an expansion of
probability, statistics and programming that implements computational
technology to solve more advanced questions and data science relies on
expected values. Learning of probability helps us in making informed
decisions about likelihood of events, based on a pattern of collected data. In
the context of data science, statistical inferences are often used to analyze or
predict trends from data, and these inferences use probability distributions
of data.

V. References
1. Udemy. 2020. “Complete Data Science Training: Mathematics, Statistics, Python,
Advanced Statistics in Python, Machine & Deep Learning”. Retrieved from:
https://www.udemy.com/course/the-data-science-course-complete-data-
science-bootcamp/learn/lecture/
2. Hayes Andy, Dyer Jason, and Ross Eli. nd. “Probability”. Retrieved from:
https://brilliant.org/wiki/probability/
3. Hayes, Adams. 2020. “Probability Distribution”. Retrieved from:
https://www.investopedia.com/terms/p/probabilitydistribution.asp
4. NIST.gov.nd. “Engineering Statistics Handbook”. Retrieved from:
https://www.itl.nist.gov/div898/handbook/eda/section3/eda366.htm
5. Mathisfun.com.nd.”Using and Handling Data”. Retrieved from:
https://www.mathsisfun.com/data/index.html

Prepared by: Mr. Arnie Armada


42

You might also like