0% found this document useful (0 votes)
7 views

DAily_LEC._sep_18_notes

The document provides an overview of random variables, including definitions, types (discrete, continuous, mixed), and their distributions. It discusses probability mass functions (pmf), cumulative distribution functions (cdf), and the concepts of expectation and variance. Additionally, it covers specific distributions such as binomial and Poisson distributions, along with examples and applications in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

DAily_LEC._sep_18_notes

The document provides an overview of random variables, including definitions, types (discrete, continuous, mixed), and their distributions. It discusses probability mass functions (pmf), cumulative distribution functions (cdf), and the concepts of expectation and variance. Additionally, it covers specific distributions such as binomial and Poisson distributions, along with examples and applications in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Notes

Statistical and Mathematical Methods

Statistical and Mathematical Methods for Data Science


DS5003

Muhammad Shahid Ashraf

Shahid Ashraf DS5003 1 / 31

Random Variables
Notes

Random Variables

Shahid Ashraf DS5003 2 / 31

Random Variables Introduction


Notes
Random Variables

A random variable is a function of an outcome,


X = f (ω)
Consider the experiment of tossing two coins. We can define X to be a
random variable that measures the number of heads observed in the
experiment. For the experiment, the sample space is shown below:
S = {HH, HT, T H, T T }
There are 4 possible outcomes for the experiment, this is the domain of X.
For each outcome, the associated value is shown as:
X(H, H) = 2
X(H, T ) = 1
X(T, H) = 1
X(T, T ) = 0

Shahid Ashraf DS5003 3 / 31

Random Variables Introduction


Notes
Example

Consider an experiment of tossing 3 fair coins and counting the number of heads.
Certainly, the same model suits the number of girls in a family with 3 children,
the number of 1’s in a random binary string of 3 characters, etc.

Shahid Ashraf DS5003 4 / 31


Random Variables Introduction
Notes
Distribution of X

For every outcome ω, the variable X takes one and only one value x. This makes events
{X = x} disjoint and exhaustive
X X
P (x) = P {X = x} = 1
x x

Shahid Ashraf DS5003 5 / 31

Random Variables Introduction


Notes
PMF and CMF Distribution of X

For any set X


P {X ∈ A} = P (x)
x∈A

When A is an interval, its probability can be computed directly from the cdf F(x),
P {a < X ≤ b} = F (b) − F (a)

Shahid Ashraf DS5003 6 / 31

Random Variables Introduction


Notes

The following code simulate 3 coin tosses 10,000 times and produce a histogram of the obtained
values of X

Shahid Ashraf DS5003 7 / 31

Random Variables Introduction


Notes
Histogram

the two middle columns for X = 1 and X = 2 are about 3 times


higher than the columns on each side.for X = 0 and X = 3.
In a run of 10,000 simulations, values 1 and 2 are attained three
times more often than 0 and 3.
which is our pmf P (0) = P (3) = 1/8, P (1) = P (2) = 3/8

Shahid Ashraf DS5003 8 / 31


Random Variables Introduction
Notes
Example

A program consists of two modules. The number of errors X1 in the first module has the
pmf P1 (x), and the number of errors X2 in the second module has the pmf P2 (x),
independently of X1 , where
Find the pmf and cdf of Y = X1 + X2 , the total number of errors

Shahid Ashraf DS5003 9 / 31

Random Variables Introduction


Notes
Types of Random Variables

Discrete random variables: are random variables, whose range is a


countable set. A countable set can be either a finite set or a
countably infinite set. For instance, in the above example, X is a
discrete variable as its range is a finite set {0, 1, 2}
Continuous random variables, have a range in the forms of some
interval, bounded or unbounded, of the real line. It can be e a union
of several such intervals
Mixed random variables are ones that are a mixture of both
continuous and discrete variables. These variables are more
complicated than the other two.

Shahid Ashraf DS5003 10 / 31

Random Variables Introduction


Notes
Examples of Random Variables

A long jump is formally a continuous random variable because an


athlete can jump any distance within some range. Results of a high
jump, however, are discrete because the bar can only be placed on a
finite number of heights.
e. Examples of continuous variables include various times (software
installation time, code execution time, connection time, waiting time,
lifetime), also physical variables like weight, height, voltage.
A job is sent to a printer.

Shahid Ashraf DS5003 11 / 31

Random Variables Introduction


Notes

Shahid Ashraf DS5003 12 / 31


Random Variables Introduction
Notes
Distribution of Random vectors

Often we deal with several random variables simultaneously


Computer Configuration
Mobile Purchase
Mobile Call Packages
Car Selection
Degree Selection

Shahid Ashraf DS5003 13 / 31

Random Variables Introduction


Notes
Joint and Marginal Distributions

If X and Y are random variables, then the pair (X, Y ) is a random


vector.
Its distribution is called the joint distribution of X and Y.
Individual distributions of X and Y are then called the marginal
distributions.
Two vectors are equal,(X, Y ) = (x, y),, if X = x and Y = y.
The joint probability mass function of X and Y is

P (x, y) = P {(X, Y ) = (x, y)} = P {X = x ∩ Y = y}

y = 1, because {(X, Y ) = (x, y)} are exhaustive and mutually


P P
x
exclusive events

Shahid Ashraf DS5003 14 / 31

Random Variables Introduction


Notes

A computer virus is trying to corrupt two files. The first file will be
corrupted with probability 0.4. Independently of it, the second file will be
corrupted with probability 0.3.
a Compute the probability mass function (pmf) of X, the number of
corrupted files.
b Draw a graph of its cumulative distribution function (cdf).

Shahid Ashraf DS5003 15 / 31

Random Variables Expectation and Variance


Notes
Mean of a Discrete Random Variable

The mean of a discrete random variable, denoted by µ, is actually the


mean of its probability Distribution.µ = xP (x)
P

The mean of a discrete random variable x is also called its expected


value and is denoted by E(x). E(x) = xP (x)
P

Shahid Ashraf DS5003 16 / 31


Random Variables Expectation and Variance
Notes
Examples

Suppose that P(0) = 0.75 and P(1) = 0.25. Then, in a long run, X is
equal 1 only 1/4 of times, otherwise it equals 0. Suppose we earn $1
every time we see X = 1. On the average, we earn $1 every four
times, or $0.25 per each observation
Consider a variable that takes values 0 and 1 with probabilities P(0)
= P(1) = 0.5.
Consider two users.One receives either 48 or 52 e-mail messages per
day, with a 50-50% chance of each. The other receives either 0 or 100
e-mails, also with a 50-50% chance. Calculate E(x) for both users.

Shahid Ashraf DS5003 17 / 31

Random Variables Expectation and Variance


Notes
Variance and Standard Deviation

Expectation shows where the average value of a random variable is


located, or where the variable is expected to be, plus or minus some
error.
How large could this “error” be, and how much can a variable vary
around its expectation
In Previous slide ,consider the first case, the actual number of e-mails
is always close to 50, whereas it always differs from it by 50 in the
second case.
The first random variable, X, is more stable; it has low variability.
The second variable, Y , has high variability.
variability of a random variable is measured by its distance from the
mean µ = E(X)

Shahid Ashraf DS5003 18 / 31

Random Variables Expectation and Variance


Notes
Variance and Standard Deviation

Variance of a random variable is defined as the expected squared


deviation from the mean. For discrete random variables, variance is
X
σ 2 = V ar(x) = (x − µ)2 P (x)
x

Standard deviation is a square root of variance


q
σ = Std(X) = V ar(X)

Shahid Ashraf DS5003 19 / 31

Random Variables Expectation and Variance


Notes
Example

Shahid Ashraf DS5003 20 / 31


Random Variables Expectation and Variance
Notes
Interpretation of the Standard Deviation

According to Chebyshev’s theorem, at least (1 − 1/k 2 ) × 100% of the


total area under a curve lies within k standard deviations of the mean,
where k is any number greater than 1.
if k = 2,then at least 75% of the area under a curve lies between
µ − 2σ and mu + 2σ.
Chebyshev’s inequality shows that in general, higher variance implies
higher probabilities of large deviations, and this increases the risk for
a random variable to take values far from its expectation.

Shahid Ashraf DS5003 21 / 31

Random Variables Families of discrete distributions


Notes
Bernoulli and Binomial distribution

Shahid Ashraf DS5003 22 / 31

Random Variables Families of discrete distributions


Notes
The Binomial Experiment:

An experiment that satisfies the following four conditions is called a


binomial experiment.
a There are n identical trials. In other words, the given experiment is
repeated n times, where n is a positive integer. All of these
repetitions are performed under identical conditions.
b Each trial has two and only two outcomes. These outcomes are
usually called a success and a failure, respectively. In case there are
more than two outcomes for an experiment, we can combine outcomes
into two events and then apply binomial probability distribution.
c The probability of success is denoted by p and that of failure by q, and
p + q= 1. The probabilities p and q remain constant for each trial.
d The trials are independent. In other words, the outcome of one trial
does not affect the outcome of another trial.
Shahid Ashraf DS5003 23 / 31

Random Variables Families of discrete distributions


Notes
The Binomial Probability Distribution and Binomial
Formula

The random variable x that represents the number of successes in n


trials for a binomial experiment is called a binomial random variable.
The probability distribution of x in such experiments is called the
binomial probability distribution.
The binomial probability distribution is applied to find the probability
of x successes in n trials for a binomial experiment.
The number of successes x in such an experiment is a discrete
random variable.

Shahid Ashraf DS5003 24 / 31


Random Variables Families of discrete distributions
Notes
Example

Question: Consider the purchase decisions of the next three


customers who enter the Clothing Store. On the basis of past
experience, the store manager estimates the probability that any one
customer will make a purchase is .30. What is the probability that
two of the next three customers will make a purchase?
Let x be the number of Success in a sample of three.
x can assume any of the values 0, 1, 2, and 3.
it is a discrete random variable.

Shahid Ashraf DS5003 25 / 31

Random Variables Families of discrete distributions


Notes

Shahid Ashraf DS5003 26 / 31

Random Variables Families of discrete distributions


Notes

Question: What is the probability of making exactly four sales to 10 customers entering the
store.
we have a binomial experiment with n = 10, x = 4, and p =.30
Shahid Ashraf DS5003 27 / 31

Random Variables Families of discrete distributions


Notes
Example

Five percent of all DVD players manufactured by a large electronics company are
defective. Three DVD players are randomly selected from the production line of this
company. What is the probability that exactly one of these three DVD players is defective
At the Express House Delivery Service, providing high-quality service to customers is the
top priority of the management. The company guarantees a refund of all charges if a
package it is delivering does not arrive at its destination by the specified time. It is known
from past data that despite all efforts, 2% of the packages mailed through this company
do not arrive at their destinations within the specified time. Suppose a corporation mails
10 packages through Express House Delivery Service on a certain day.
Calculating the probability using the binomial formula.
Find the probability that exactly one of these 10 packages will not arrive at its
destination within the specified time.
Find the probability that at most one of these 10 packages will not arrive at its
destination within the specified time.
An exciting computer game is released. Sixty percent of players complete all the levels.
Thirty percent of them will then buy an advanced version of the game. Among 15 users,
what is the expected number of people who will buy the advanced version? What is the
probability that at least two people will buy it?

Shahid Ashraf DS5003 28 / 31


Random Variables Families of discrete distributions
Notes
The Poisson Probability Distribution:

The number of rare events occurring within a fixed period of time has
Poisson distribution.
The following examples also qualify for the application of the Poisson
probability distribution.
The number of accidents that occur on a given highway during a
1-week period
The number of customers entering a grocery store during a 1-hour
interval
The number of television sets sold at a department store during a
given week
The following three conditions must be satisfied to apply the Poisson
probability distribution.
x is a discrete random variable.
The occurrences are random.
The occurrences are independent.
Shahid Ashraf DS5003 29 / 31

Random Variables Families of discrete distributions


Notes

PROPERTIES OF A POISSON EXPERIMENT


The probability of an occurrence is the same for any two intervals of
equal length.
The occurrence or nonoccurrence in any interval is independent of the
occurrence or nonoccurrence in any other interval.

Shahid Ashraf DS5003 30 / 31

Random Variables Families of discrete distributions


Notes

On average, a household receives 9.5 telemarketing phone calls per


week. Using the Poisson probability distribution formula, find the
probability that a randomly selected household receives exactly 6
telemarketing phone calls during a given week.
A washing machine in a laundromat breaks down an average of three
times per month. Using the Poisson probability distribution formula,
find the probability that during the next month this machine will have
exactly two breakdowns
at most one breakdown
The number of emails that I get in a weekday can be modeled by a
Poisson distribution with an average of 0.2 emails per minute.
What is the probability that I get no emails in an interval of length 5
minutes?
What is the probability that I get more than 3 emails in an interval of
length 10 minutes?

Shahid Ashraf DS5003 31 / 31

Notes

You might also like