0% found this document useful (0 votes)
9 views

statatics and probability chapter 3 and 4

This document covers the concepts of random variables and probability distributions, defining discrete and continuous random variables along with their probability functions. It explains the characteristics of various probability distributions, including binomial, Poisson, and normal distributions, and provides examples and exercises for calculating expectations, variances, and probabilities. The document emphasizes the importance of these statistical concepts in analyzing data and making predictions.

Uploaded by

merhawitareke27
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

statatics and probability chapter 3 and 4

This document covers the concepts of random variables and probability distributions, defining discrete and continuous random variables along with their probability functions. It explains the characteristics of various probability distributions, including binomial, Poisson, and normal distributions, and provides examples and exercises for calculating expectations, variances, and probabilities. The document emphasizes the importance of these statistical concepts in analyzing data and making predictions.

Uploaded by

merhawitareke27
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 10

Chapter II&IV: Probability and Statistics

CHAPTER III AND IV


RANDOM VARIABLE AND PROBABILITY DISTRIBUTION THEORY
RABDOM VARIABLES

Definition 1: A random variable is defined as any variable (function) X whose domain is the
sample space () of some experiment and whose range is a subset of .

Example: Consider the experiment of tossing a coin twice.


Hence = {HH, HT, TH, TT}. Let X be the number of heads obtained, therefore the range
space of X is Rx= {0, 1, 2}. Since X(HT)=X(TH)=1, X(HH)=2 and X(TT)=0, therefore X is a
random variable.
Definition 2: Let X be a random variable. If X assumes discrete value that can be listed as x 1, x2,
…xn then X is called a discrete random variable. Otherwise it is called a continuous variable.

Definition 3: Let X be a discrete random variable with possible value as x1, x2, …xn.
Let f(xi) = P(X=xi) be the probability of xi. Then P(X) is called a probability distribution function
of X if
a.) P(X=xi)  0 for all i

b.)

Example 1: Let X = number of the face which shows up on a single throw of a die. Then

X=xi 1 2 3 4 5 6
P(=Xxi) 1/6 1/6 1/6 1/6 1/6 1/6

Observe that P(Xi) = 1/6 ≥ 0 for all i = 1, 2, 3, 4, 5, 6

and = = = 1. Therefore P(X) is a probability distribution function.

Exercise: Check whether the following can serve as legitimate probability distributions or not?

a) (x) = for x = 1, 2, 3, 4

b) h(x) = for x = 0, 1, 2, 3, 4

1
Chapter II&IV: Probability and Statistics

Definition 4: Let X be a continuous random variable in . Then, f (orP(X)) is said to be the


probability density function (pdf) of X if
a. f(x) = P(x)  0 for all X   (Domain)

b.

N.B

a.) For any finite numbers a and b, .In this case the probability for X to

be between a and b is defined as the area bounded by f(x), the a-axis and the vertical lines X = a
and X = b.
b.) In continuous probability distribution, we cannot take probability at a point since

P(X=x0) = = 0 and

c.) The following probabilities are all equal if X is a continuous random variable.
P(a  x  b) =P(a  x < b)=P(a < x  b)=P(a < x < b)

Example 2: Suppose that the proportion of time X that an industrial robot is in operation during
a 40-hour work week is a random variable with probability density function (pdf) given by

a) for the above pdf, calculate P(0.5 < X< 0.8) and P(x  0.5)
Solution:

b) Find CDF (F(x)) Exercise


Characteristics of Random variable
Expectation and Variance of a random variable

We want some summary measure of where X is likely to be if we observe it repeatedly.

Definition: The mean or expectation or expected value of a discrete random variable X is


E(X) = , provided the series is absolutely convergent.

Example:
Let the distribution of X be given by P(X=8)=0.7 and P(X=2) = 0.3. Calculate E(X).
Solution:

Theorem:

2
Chapter II&IV: Probability and Statistics

If a and b are any real numbers and constants, then .

Recall that X is just a function on the sample space. So for any function h, Y = h(X) is another
function on X, i.e. another random variable. Its expectation E(Y) = E(h(X)) is thus defined, but it
looks difficult to evaluate, as we would need first to calculate the distribution of Y .

Definition
For a random variable X with finite mean μ, the variance of X is

This will work for other sorts of random variables that we’ll meet later, as well as for discrete
ones.
For discrete random variables

The variance indicates the spread of X about E(X). If X is a random variable quantity with a
dimension, say, a length measured in mm, then its variance has units mm 2. For a measure of
spread in the original units we make use of the standard deviation.

Definition:
The standard deviation of X is the positive square root of the variance. That is

Theorem: If X has finite mean then Var (X) = E(X2) − (E(X))2. Proof left as an exercise.

Theorem: If a and b are constants, var(aX + b) = a2 var(X). Proof left as an exercise.


Exercise:
Suppose X takes the values 1, 2 and 3 with respective probabilities 1/2, 1/3 and 1/6. Find E(X)
and var(X), and also E(1/X).

Special Discrete Probability Distributions

1. The binomial distribution


The binomial distribution results when an experiment can have just two results (positive or
negative, for instance), and when you do the experiment several times and count the number of
times you get one particular result. Some examples are the number of progeny of a genetic cross
that have a particular phenotype or the number of vesicles released at a synapse. (In the latter
case, the two possibilities are that a vesicle is released when the neuron fires, or that it is not
released.) It has two parameters, N and p, where N is the number of experiments, and p is the
probability of a positive in any given one. In the first example, N would be the number of
vesicles at a particular synapse and p the probability that a particular vesicle will be released. In
the second, N would be the total number of progeny from the cross, and p the probability that a

3
Chapter II&IV: Probability and Statistics

given progeny has the phenotype you’re counting. In addition, we define q as (1-p) for
convenience. n, the number of positives, can range from 0 to N. The probability of a given n is:

The mean of a binomial is Np and the variance is Npq.


If N is large and p is small, the binomial is approximated by a Poisson with .

Suppose an experiment has n independent trials where each trial has (two) mutually exclusive
outcomes, success or failure.
Let p be the probability of success of an event.
q=1-p=probability of failure of an event and X=Number of success where X is a random
variable. Then, the probability of obtaining “r” success out of n-trials of the experiment is given

by P(X = r) = , where r = 0, 1, 2, 3, …, n

Definition:
A random variable X that has the above probability distribution is said to have a Binomial
probability distribution.
Example 1:
Suppose a medical experience shows that 20% of the people who have cancer died. What is the
probability that two will survive out of three?
Solution:
Let the ransom variable X be the number people who survive. We know that
N = 3, r = 2, p = 80%, q = 20%

Therefore P(X=2) =

Example 2:
The probability that a seed will germinate is 0.3. What is the probability that out of 6 seeds.
a. one b. two c. none d. at least two will germinate?
Solution:
Let X be a random variable of counting the number of seeds that germinate,
n =6, p = 0.3, q = 0.7

a. r = 1 => P(X=1) = (0.3)1 (0.7)5 =0.302526 b.) and c.) left as an exercise

4
Chapter II&IV: Probability and Statistics

d. r  2 => P(X  2) = 1 – [P(X=0) + P(X=1)]=0.420175

2. The Poisson Distribution

The Poisson distribution usually results when you count how many times some event occurs,
when the events occur independently, and when there is no upper bound (or only a very large
upper bound) on the number of events. Good examples are the number of phase that infect a
bacterium, the number of times a particular gene is present in a library of random clones, or the
number of atoms of a radioisotope that decay in a given time. In any given experiment, the
number of events will be a whole number 0, 1, 2, … It has one parameter, the mean , the
average number of events. The probability of seeing n events is:

Remark:
The variance of a Poisson is equal to the mean, so the standard deviation is the square root of the
mean.
An important special case is n = 0. This is the probability that the event will not happen even
once, e.g., the probability that your gene will not be present in the library or that a bacterium will
be uninfected:

In the Binomial distribution, suppose that , the constant probability of success P  0 as n 


with np =  (a constant number), then, the probability of obtaining “r” successes is given by

, where r = 0, 1, 2,…,n

Here X is said to have a Poisson distribution with parameter .

Example 1:
Let the probability p of a car facing an accident in two intersecting roads be 0.0002. Suppose
during a certain period 10000 cars passed. What is the probability that at least three accidents
will occur during that period?

5
Chapter II&IV: Probability and Statistics

Solution:
Let X be a random variable of counting the number of cars having accidents at the two
intersecting roads, so let r = 3, 4, 5…1000. n = 10000 (very large), with probability of accidents
P as p= 0.0002, q = 1-p = 0.9998. The expected (average) number of accidents is  = np = 2
Therefore P(X  3) = P(X = 3) + P(X = 4) + … +P(X =10000)

=1 – [P(X = 0) + P(X = 1) + P(X=2)] = 1-  0.32334

3. Hyper geometric Distribution


Suppose we have N items, which can be divided in to two dichotomous groups r and N-r
(examples, Defective/good, pass/fail). If we select “n” items from the “N” items and let X = k,
the number of items from the first group of r items, then

, k = 0, 1, 2, 3…

Definition: A discrete random variable having the above probability distribution is said to have
as hypergeometric distribution.
Example:
Small electric motors are shipped in lots of 50. Before such a shipment is accepted, an inspector
chooses 5 of these motors and inspects them. If none of these tested motors are defective, the lot
is accepted. If one or more are found to be defective the entire shipment is inspected. Suppose
that there are in fact three defective motors in the lot, what is the probability that 100%
inspection is required?
Solution:
Let X is a random variable denoting the number of defective motors found. The whole lot or 100
% inspection will be required if and only if one or more are found to be defective, that is X  1.

Therefore,

6
Chapter II&IV: Probability and Statistics

Special kinds of Continuous Probability Distributions


Normal distribution
The normal distribution is the most common distribution for a continuous variable. Most known
classical statistical tests are based on the assumption that the variable being measured is normally
distributed. The formula for the normal distribution looks like this:

Where are the mean and variance of a normally distributed random variables.

Characteristics of Normal Distribution


Any Normal Distribution is
 Bell-shaped, Symmetric about mean, continuous, never touches the x-axis (asymptotic to
the x-axis), and the total area under curve is 1.
 Approximately 68% lies within one standard deviation of the mean, 95% within two
standard deviations, and 99.7% within three standard deviations of the mean. This is the
Empirical Rule mentioned earlier (in chapter 1).
 Data values represented by x, which has mean and standard deviation .
 There is a good theoretical consideration that, Poisson distribution can be approximated
by a normal distribution of the same mean and variance if  is large. As well as a
binomial can be approximated by a normal distribution if Np and Nq are both large.
If a continuous random variable X has a normal distribution with mean  and variance 2 , it is
usually denoted by ~N( , 2) for - < X <  where X ,    and  >0.

The p.d.f of X is given by and = 1 => the whole area under a normal

distribution is equal to 1. Therefore any area is a measure of probability.


The shaded region measures the probability that x i is between a and b and is given by

The above is not easy to integrate instead we have a ready-made table that gives an area under a
normal distribution.
How to get an area under a normal distribution
There are two types of normal distribution
a. Non standard normal (NSN)
b. Standard normal (SN)

7
Chapter II&IV: Probability and Statistics

a. Non-standard normal
The variable is x, with the mean and variance are, respectively, E(X)= , and 2 0. The

Skewness and Kurtosis, respectively are, 3=0, and 4=3, Mean deviation = = 0.7979 

b. Standard normal (SN)


The variable is Zi, E(Zi) = = 0, S2 = 2 = 1

To get an area under a normal distribution transform the NSN to SN.

Transformation equation is Zi =

The areas to the left and right of Z are equal (0.5)


Some area under Standard normal curve
Between Z = -1 (X =  - ) and Z = 1 (X =  + ) = 0.6826 (68.26%)
Between Z = -2 (X =  - 2) and Z = 2 (X =  + 2) = 0.9573 (95.45%)
Between Z = -3 (X =  - 3) and Z = 3 (X =  + 3) = 0.9973 (99.73%)
Numerical example
The weekly income or salary of workers in a factory is normally distributed with mean $200 and
variance $2,500.
a. If an individual is chosen at random, what is the probability that his/her income is
i. at least $ 325 ii. less than $ 100
iii. between $ 150 and $ 300 iv. between $ 150 & $ 180
b. If there were a total number of employees in the factory is 300, how many of them
have a salary between $ 150 & $ 180

Solution
a.)
i. Information is given in non-standard normal. Transform the NSN to SN.

, For Z = 2.5, the standard normal table gives 0.4938, therefore

P(X>325)=P(Z > 2.5) = 0.5 – 0.4938 = 0.0062. This means that there is a 0.62% chance that for
an individual chosen at random from the factory to have a salary at least $325.

8
Chapter II&IV: Probability and Statistics

Similarly
ii. P(x < 100)

, area under standard normal curve gives

iii. P(150  X  300)

iv. P(150  X  180)

b. Number of employees with salary $ 150 and $ 180 = 300 x P(150  X  180)
= 300 x 0.1859 = 55.77  56
Thus, there are around 56 employees whose salary is between 150 and 180 in a total of 300
employees.
Uniform Distribution
Suppose that X is a continuous random variable assuming all values in the interval [a , b], where
both a and b are finite. If the probability density function (pdf) of X is given by

Then, we say that X is uniformly distributed over the interval [a , b]

Theorem:

For X which is uniformly distributed in the interval [a, b], then, then mean, E(X) =(a + b)/2, and
the variance, Var (X)=(b − a)2 /12

Example A point is chosen at random on the line segment [0,2], what is the probability that the
chosen point lies between 1 and 1.5?

9
Chapter II&IV: Probability and Statistics

Solution: Let X be a random variable defined on the interval [0,2], since any value in this
interval is equally probable, then we can regard the random variable, X as having a uniform

distribution with pdf, f(x) = , 0 < x < 2, and 0 elsewhere. Hence,

Exponential Distribution

Definition: A continuous random variable X assuming all non-negative values is said to have an
exponential distribution with parameter >0 if its p.d.f is given by

Example:
Research has proposed an exponential model with =0.93 to describe the distribution of a
photon’s free path ( m) under certain circumstances. What is the probability that the path length
is between 1.0 and 3.0? Assignments!
Bivariate Probability Distribution Reading assignments.
Exercise.
a) What is the probability of getting three heads by tossing a fair coin four times? Hint Bin
b) If 1.6 accidents can be expected on an intersection on any given day, what is the
probability that there will be 3 accidents on any given day?
c) Find the area under the standard normal distribution which lies

d) A random variable X has a normal distribution with mean 80 and standard deviation 4.8.
What is the probability that it will take a value
a) Less than 87.2 b) Greater than 76.4 c) Between 81.2 and 86.0
e) A normal distribution has mean 62.4.Find its standard deviation if 20.0% of the area
under the normal curve lies to the right of 72.9
f) A random variable has a normal distribution with . Find its mean if the
probability that the random variable will assume a value less than 52.5 is 0.6915.

10

You might also like