0% found this document useful (0 votes)
2 views

L2_ Mathematical Preliminaries

The document outlines mathematical preliminaries essential for data science applications, covering basic mathematics, probability concepts, and random variables. It discusses key topics such as vectors, matrices, linear algebra, and different probability approaches including Bayesian and Frequentist methods. Additionally, it explains the significance of random variables, their distributions, and specific types like binomial and continuous random variables.

Uploaded by

carolnjeri0123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

L2_ Mathematical Preliminaries

The document outlines mathematical preliminaries essential for data science applications, covering basic mathematics, probability concepts, and random variables. It discusses key topics such as vectors, matrices, linear algebra, and different probability approaches including Bayesian and Frequentist methods. Additionally, it explains the significance of random variables, their distributions, and specific types like binomial and continuous random variables.

Uploaded by

carolnjeri0123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

ICT583 Data Science Applications

TOPIC 2: Mathematical Preliminaries


Outline
• Basic maths
• Probability
 Bayesian vs. Frequentist
 compound events
 conditional probability
 random variables
Basic Mathematics
Basic symbols and terminology
• Vectors: an object with both magnitude and
direction. It is a 1-dimensional array
representing a series of numbers.
• We use index notations to denote the
element in the vector:
Basic symbols and terminology
• Matrix: 2-dimensional representation of
arrays of numbers.
• n x m (n by m) denotes the dimension of a
matrix, tells the matrix has n rows and m
columns.

• If a matrix has the same number of rows


and columns, it is called a square matrix.
Basic symbols and terminology
• Three offices in different locations, each with
the same three departments: HR,
engineering, and management.
Arthmetic symbols
• The uppercase sigma ∑ symbol is a
universal symbol for addition.
• Whatever is to the right of the sigma symbol
is usually something iterable, meaning that
we can go over it one by one (for example, a
vector).
X = [1, 2, 3, 4, 5]
Arthmetic symbols
• The dot product is an operator like addition and
multiplication. It is used to combine two vectors.

Scalar

• Let's say we have a vector that represents a customer's


sentiments toward three genres of movies: comedy,
romance, and action. On a scale of 1-5, a customer
loves comedies, hates romantic movies, and is alright
with action movies.

Here, 5 denotes their love for comedies, 1 their hatred of romantic


movies, and 3 the customer's indifference toward action movies.
Arthmetic symbols
• Assume that we have two new movies, one of which is a
romantic comedy and the other is a funny action movie. The
movies would have their own vector of qualities :

Here, m1 is our romantic comedy and m2 is our funny action


movie.
Let's compute the recommendation score for each movie. For
movie 1,
Arthmetic symbols
• The answer we obtain is 28, but what does this number
mean? On what scale is it?
• The best score anyone can ever get is when all values
are 5,

• The lowest possible score is when all values are 1,


Arthmetic symbols
• How about movie 2?

So, between movie 1 and movie 2, we would definitely


recommend movie 2 to our user.
This is, in essence, how most movie prediction engines
work.
Linear algebra
• It is an area of mathematics that deals with the math of
matrices and vectors.
• Matrix multiplication

• Movie recommendation example : Recall the user's movie


genre preferences of comedy, romance, and action.

• Now suppose we have 10,000 movies, all with a rating for


these three categories. Can you visualize the matrix
multiplication?
Logarithms/exponents
• An exponent tells you how many times you have to
multiply a number by itself

• A logarithm is the number that answers the question


"what exponent gets me from the base to this other
number?"
Logarithms/exponents
• How we can use both versions to say the same
thing.

• Examples:
Logarithms/exponents
• Example: the number e is around 2.718 and has many practical
applications. A very common application is interest calculation for saving.
Suppose you have $5,000 deposited in a bank with continuously
compounded interest at the rate of 3%, then we can use the following
formula to model the growth of your deposit:

A denotes the final amount


P denotes the principal investment (5000)
e denotes a constant (2.718)
r denotes the rate of growth (.03)
t denotes the time (in years)
Introduction to probability
Basic definitions
• Procedure: A procedure is an act that leads to a result,
for example, throwing a die or visiting a website.
• Event: A collection of the outcomes of a procedure,
such as getting a head on a coin flip or leaving a
website after only 4 seconds.
• Sample space of a procedure is the set of all possible
simple events.
For example, an experiment is performed in which a
coin is flipped three times in succession. What is the
size of the sample space for this experiment?
The answer is eight. The results could be any one of the
possibilities in the following sample space: {HHH, HHT, HTT,
HTH, TTT, TTH, THH, or THT}.
Probability
• The probability of an event represents the
frequency, or chance, or the likelihood that
the event will happen.
• If A is an event, P(A) is the probability of the
occurrence of the event.
• Actual probability of an event A

• The maximum probability


of any event is 1.
Frequentist approach
In a Frequentist approach, the probability of an
event is calculated through experimentation. It
uses the past in order to predict the future chance
of an event.

Core idea: relative frequency of an event is how


often an event occurs divided by the total number
of observations.
Frequentist approach
Example: You are interested in ascertaining how often a
person who visits your website is likely to return on a later
date (rate of the repeat visitors).
Event A - being an visitor coming back to the site
Using Frequentist approach, we can take the visitor logs
and calculate the relative frequency of event A. Suppose
we have 1,458 unique visitors in the past week, 452 were
repeat visitors. We can calculate this as:

The law of large numbers: If we repeat a procedure over


and over, the relative frequency probability will approach
the actual probability.
Compound Events

• A compound event is an event that


combines two or more simple events.
• Given events A and B: The probability that
A and B occur is P(A ∩ B); either A or B
occurs is P(A ∪ B)
• Example: our universe is 100 people who
showed up for an experiment in which a
new test for cancer is being developed
• The red circle, A, represents 25 people who
actually have cancer.
• The circle B contains people for whom the
test was positive (it claimed that they had
cancer) - 30 people
• A ∩ B are people for whom the test claimed
they were positive for cancer (A), and they
actually do have cancer - 20 people
Conditional Probability

Example: Let's pick an arbitrary person from a


study of 100 people. You are told that his
result was positive. What is the probability of
him actually having cancer?
So, we are told that event B has already taken
place. The question now is: what is the
probability that they have cancer, that is P(A)?
This is called a conditional probability of A
given B or P(A|B).
Conditional Probability

The conditional probability P(A|B) is defined:

Conditional probabilities get interesting only


when events are not independent, otherwise:
Compound Events and Independence

To calculate P(A ∩ B) = P(A and B), we use the


following formula:
P(A ∩ B) = P(A and B) = P(A)P(B|A)
If events A and B are independent

Independence (zero correlation) is good to


simplify calculations but bad for prediction.

 Correlations are the driving force behind


predictive model.
Bayesian ideas

• Three things and how they are


interact with each other
a prior distribution
a posterior distribution (what are we
finding)
a likelihood
• Another way to say: data shapes and
updates our belief
Bayes' Theorem

Let's try thinking about Bayes using the terms


hypothesis and data. Suppose H = your hypothesis
about the given data and D = the data that you are
given.
Bayes can be interpreted as trying to figure out
P(H|D) (the probability that our hypothesis is
correct, given the data at hand).
Bayes' Theorem

P(H) is the probability of the hypothesis before we


observe the data, called the prior probability, or just
prior
P(H|D) is what we want to compute, the probability of
the hypothesis after we observe the data, called the
posterior
P(D|H) is the probability of the data under the given
hypothesis, called the likelihood
P(D) is the probability of the data under any hypothesis,
called the normalizing constant
Bayes' Theorem

It is not only a powerful tool in the field of


probability, but also is widely used in the field of
machine learning, such as its use in a probability
framework for fitting a model to a training dataset,
referred to as maximum a posteriori (MAP), and in
developing models for classification predictive
modeling problems such as Naive Bayes.
Random variables
Distributions of Random Variables

Random variables are numerical functions where


values come with probabilities.

Probability density functions (pdfs) represent


RVs, essentially as histograms.
Here V is the sum of two dice.
Probability/Cumulative Distributions

• The cdf is the running sum of the pdf:

The pdf and cdf contain


exactly the same
information, one being
the integral/derivative
of the other.
Discrete random variables
• A discrete random variable only takes on a
countable number of possible values, such as the
outcome of a dice roll
Discrete random variables
Properties:
• Expected value (mean) of a random variable:
the mean value of a long run of repeated
samples of the random variable. This is
sometimes called the mean of the variable.
Discrete random variables
Properties:
• Variance of a random variable represents the
spread of the variable. It quantifies the variability
of the expected value.

 represents the expected value of the variable.


Sigma is the standard deviation, which is defined
simply as the square root of the variance.
Binomial random variables
(discrete)
• Look at a setting in which a single event happens
over and over and we try to count the number of
times the result is positive.
• Binomial settings has four conditions:
The possible outcomes are either success or
failure
The outcomes of trials cannot affect the outcome
of another trial
The number of trials was set (a fixed sample size)
The chance of success of each trial must always
be p
Binomial random variables
• The PMF for a binomial random variable is as
follows:

X counts the number of successes in a binomial


setting. n = the number of trials, p = the chance of
success of each trial.
Binomial random variables
• Example: Blood types
A couple has a 25% (p) chance of a having a child
with type O blood. What is the chance that three (X)
of their five (n) kids have type O blood?

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑉ሾ𝑋ሿ = 𝜎𝑥2 = ෍ ሺ 𝑥𝑖 − 𝜇𝑥ሻ 2 (𝑝𝑖 ) = .9375


So, this family can expect to have probably one or two kids with type
O blood.
Binomial random variables
• Binomial random variables have special
calculations for the exact values of the
expected values and variance.
E(X) = np
V(X) = np(1 − p)
• In the previous exmaple, we can use the
formulas to calculate an exact expected value
and variance:
E(X) = .25(5) = 1.25
V(X) = 1.25(.75) = 0.9375
Continuous random variables

• Unlike a discrete random variable, a continuous


random variable can take on an infinite number
of possible values, not just a few countable ones.
• We call the functions that describe the
distribution probability density function
(PDF) instead of PMF.
• If X is a continuous random variable, then there
is a function, f(x), for any constants a and b:

• The f(x) function is known as the probability density function


(PDF).
Continuous random variables

• If X is a continuous random variable, then there


is a function, f(x), for any constants a and b:
Standard normal distribution

• The PDF of this distribution is as follows:

μ is the mean of the variable and σ is the standard


deviation.

μ=5 σ=5

You might also like