0% found this document useful (0 votes)
47 views

Distribution PPT

The probability that exactly 2 defective bulbs are produced in a 30-minute interval is 0.2510.

Uploaded by

Varchasva Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Distribution PPT

The probability that exactly 2 defective bulbs are produced in a 30-minute interval is 0.2510.

Uploaded by

Varchasva Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

By

Dr. Anjula Mehto


Assistant Professor
Thapar Institute of Engineering and Technology,
Patiala, Punjab
[email protected]

1
Discrete and Continuous Data
 When looking at a set of numbers, they are
typically :
 Discrete (countable)
 Continuous (measurable)

2
Discrete Data
 Refers to individual and countable items
(discrete variables).
 Involves counting rather than measuring.
 Examples-
 Count number of computers in each
department.
 Count the number of students in a class.

3
Discrete Data
 Characteristics-
 Discrete variables are finite, numeric, countable,
and non-negative integers (5, 10, 15, and so on).
 It can be easily visualized and demonstrated
using simple statistical methods such as bar
charts, line charts, or pie charts.
 It can also be categorical - containing a finite
number of data values, such as the gender of a
person.

4
Continuous Data
 It is a type of numerical data that refers to the
unspecified number of possible measurements
between two realistic points.
 Continuous data is all about accuracy.
 Variables in these data sets often carry decimal
points.
 Examples-
 Measuring daily wind speed
 Measuring temperature of a city
 Measuring a person’s height
5
Continuous Data
 Characteristics-
 Data changes over time and can have different
values at different time intervals.
 Data is made up of random variables, which may or
may not be whole numbers.
 Data is measured using data analysis methods such
as line graphs, skews, and so on.
 Regression analysis is one of the most common
types of continuous data analysis.

6
Statistical Distributions
 Also called as probability distribution.
 Statistical distributions are mathematical
functions that describe the behavior and
characteristics of random variables.
 Statistical distribution helps to understand a
problem better by assigning a range of possible
values to the variables, making them very
useful in data science and machine learning.

7
Types of Statistical Distributions
 Depending on the type of data, distribution
are grouped into two categories:
 Discrete distributions for discrete data
 Continuous distributions for continuous
data

8
Discrete Distributions
A discrete distribution is a probability
distribution that describes the probability of
occurrence of each possible outcome in a set
of discrete values.
 It is characterized by a probability mass
function (PMF), which gives the probability of
each possible outcome.

9
Probability Mass Function (PMF)
 Gives the probability of a discrete random
variable taking on a specific value.
 Maps each possible outcome of a random
variable to its probability.
 The PMF is defined as:
 P(X=x)
 X is the discrete random variable
 x is the value of the random variable,

10
Types of Discrete Distributions
 Bernoulli distribution
 Binomial distribution
 Poisson distribution

11
Bernoulli Distribution
 Single Trial with Two Possible Outcomes.
 Any event with a single trial and only two possible
outcomes follow a Bernoulli distribution.
 Example-
 Flipping a coin.
 Choosing between True and False in a quiz.

12
Bernoulli Distribution

13
Bernoulli Distribution
 The PMF of Bernoulli distribution=
px (1 - p)1 - x, x ϵ {0, 1}

‘p’ probability of success


(1-p) or ‘q’ probability of failure

14
Bernoulli Distribution
 The expected value or Mean of Bernoulli
distribution:
E(x) = p
 Variance of Bernoulli distribution:
Var(x) = p(1-p)
= pq

15
Binomial Distribution
 A sequence of Bernoulli events.
 It can be thought of as the sum of outcomes of
an event following a Bernoulli distribution.
 Therefore, it is used in binary outcome events,
and the probability of success and failure is
the same in all successive trials.
 Example -
 Flipping a coin multiple times to count the
number of heads and tails
16
Binomial Distribution
 Example- If you flipped a coin twice
 [{H,H}, {H,T}, {T,H}, {T,T}]
 {H,H} = ½ * ½ = ¼, {T,T} = ½ * ½ = ¼
 {H,T} or {T,H} = ½ * ½ + ½ * ½ = ½

17
Binomial Distribution
 A binomial distribution is represented by :
B (n, p)
‘n’ is the number of trials,
‘p’ is the probability of success in a single trial
 The probability of success (x) for these n trials
or PMF:

x= 0,1,2…………..n
18
Binomial Distribution
 Expected value or Mean of a binomial
distribution can be represented as :
E(x) = np
 Similarly, variance is represented as:
Var(x) = np(1-p)
= npq

19
Binomial Distribution
 For example, suppose that a candy company
produces both milk chocolate and dark
chocolate candy bars. The total products
contain half milk chocolate bars and half dark
chocolate bars.
 Say choose ten candy bars at random and
choosing milk chocolate is defined as a
success.
 n=10, p=1/2=0.5

20
Binomial Distribution
 The probability distribution of the number of
successes during these 10 trials with p = 0.5

21
Numerical
 Suppose a basketball player makes a free throw
with a probability of 0.7. If the player attempts
10 free throws, what is the probability that
they make exactly 7 of them?

22
Numerical
 Solution- Binomial probability problem

Put the values; n=10, p=0.7, x=7 in the above


formula:
P(7) = 0.266

23
Poisson Distribution
 The probability that an event May or May not
occur.
 It gives the probability of an event happening a
certain number of times (x) within a given
interval of time or space.

24
Poisson Distribution
 Examples-
 The number of phone calls received by a call
center during one hour of operation
 Text messages per hour
 Website visitors per month

25
Poisson Distribution
 Characteristics:
 The events are independent of each other.
 An event can occur any number of times
(within the defined period).
 Two events can’t take place simultaneously.

26
Poisson Distribution
 The probability mass function (PMF) of the
Poisson distribution is:

 X= random variable following a Poisson distribution


 x=number of times an event occurs
 P(X=x) = probability that an event will occur x times
 e = Euler’s constant (approximately 2.718)
 λ= is the average number of times an event occurs
27
Poisson Distribution
 Expected value or Mean of a Poisson
distribution can be represented as :
E(x) =
 Similarly, variance is represented as:
Var(x) =

28
Numerical
 Suppose that the average rate of calls received
by the call center during one hour is 10. Then,
calculate the probability of receiving 8 or
fewer calls during one hour?

29
Numerical
 Solution- Poisson Distribution
 λ= 10
 where λ is the mean or average rate of calls
received by the call center during one hour
 X=x ≤ 8
 where ‘X’ is the random variable representing
the number of calls received by the call center
during one hour.

30
Numerical

 X=x ≤ 8
P(X ≤ 8) = Σ P(X = x), for x = 0 to 8

P(X = 0) = (10^0 * e^(-10)) / 0! ≈ 0.0000454


P(X = 1) = (10^1 * e^(-10)) / 1! ≈ 0.000454
P(X = 2) = (10^2 * e^(-10)) / 2! ≈ 0.00227
P(X = 3) = (10^3 * e^(-10)) / 3! ≈ 0.00757
P(X = 4) = (10^4 * e^(-10)) / 4! ≈ 0.0189

31
Numerical
P(X = 5) = (10^5 * e^(-10)) / 5! ≈ 0.0378
P(X = 6) = (10^6 * e^(-10)) / 6! ≈ 0.0631
P(X = 7) = (10^7 * e^(-10)) / 7! ≈ 0.0901
P(X = 8) = (10^8 * e^(-10)) / 8! ≈ 0.1126

P(X ≤ 8) ≈ 0.332

32
Numerical
1. Suppose that a manufacturing company
produces light bulbs at a rate of 3 defective
bulbs per hour. What is the probability that
exactly 2 defective bulbs are produced in a
30-minute interval?
2. Suppose a factory produces electronic
components, and 5% of the components are
defective. If a sample of 200 components is
randomly selected, what is the probability
that there are fewer than 10 defective
components in the sample? 33
Numerical
 Solution (1)- Poisson Distribution
 λ = (3/60) * 30 = 1.5
where λ is the rate parameter for the Poisson
distribution
 X=x=2
 Put the values in the formula:
P(X = x=2) = (e^(-1.5) * 1.5^2) / 2!
P(X = 2) ≈ 0.2510

34
Numerical
 Solution (2)- Binomial distribution problem
p= 0.05, n=200, x<10
P(X<10) = 0.98

35
Continuous Distribution
 Describes the distribution of continuous
random variables.
 A continuous random variable can take on any
value within a range or interval of values, as
opposed to a discrete random variable that can
only take on distinct values.
 It is characterized by Probability Density
Function (PDF).

36
Probability Density Function (PDF)
 Describes the probability distribution of a
continuous random variable.
 Gives the relative likelihood of a random
variable (X) taking on a particular value (x)
within a given range of values (a, b).
 PDF=

37
Types of Continuous Distribution
 Uniform Distribution
 Normal or Gaussian Distribution
 Student t-Test Distribution
 Exponential Distribution

38
Uniform Distribution
 It is a continuous or rectangular distribution.
 It describes an experiment where an outcome
lies between certain boundaries.
 Example-
 Time to fly from Delhi to Hyderabad ranges
from 120 to 150 minutes if we monitor the fly
time for many commercial flights it will follow
more or less the uniform distribution.

39
Uniform Distribution
 PDF f(x) = 1 / (b - a) for a ≤ x ≤ b
 f(x) is the probability density function of X
 a and b are the lower and upper bounds of the
distribution, respectively.

40
Uniform Distribution
 The Expected value or Mean
E(X) = (a + b) / 2

 Variance
Var(X) = (b - a)^2 / 12

41
Normal Distribution
 Symmetric Distribution of Values Around the
Mean
 Also called as Gaussian or Bell curve distribution.
 It is most commonly used in data science.
 Describes the probability of a continuous random
variable that takes real values.
 When plotted, the data follows a bell shape, with
most values clustering around a central region
and tapering off as they go further away from the
center.
42
Normal Distribution
 Example-
 Average weight of a population
 The scores of a quiz

43
The scores of a quiz
 Many of the students scored between 60 and
80.
 The students with scores that fall outside this
range (outliers) are deviating from the center.

44
Normal Distribution
 Characteristics-
 The random variable takes values from -∞ to
+∞.
 Mean, mode and median (measures of central
tendency) coincide with each other.
 The distribution curve is symmetrical to the
centre.
 The area under the curve is equal to 1.

45
Normal Distribution- 68-95-99.7 Rule
 While plotting a graph for a normal
distribution, 68% of all values lie within one
standard deviation from the mean.
 Similarly, 95% of the values lie within two
standard deviations from the mean, and
99.7% lie within three standard deviations
from the mean.
 This last interval captures almost all matters.
If a data point is not included, it is most likely
an outlier.
46
Normal Distribution- 68-95-99.7 Rule
 If the mean is 70 and the standard deviation is
10, 68% of the values will lie between 60 and
80, and so on for 95% and 99.7%.

47
Normal Distribution
 PDF of normal distribution-

-∞ < x < +∞, -∞ < μ < +∞, > 0


 μ is the mean (or expectation) of the
distribution
 σ is the standard deviation of the distribution
 ‘x’ is the specific value of the random variable
‘X’
48
Normal Distribution
 The expected value or Mean of a Normal
distribution:
E(x) = μ

 Variance of a Normal distribution:


Var(x) = σ2

49
Standard Normal Distribution
 Has a mean of zero and a standard deviation
of one.
 The x values of the standard normal
distribution are called z-scores.
 Z-score is used to determine the probability of
a given value occurring in a normal
distribution, using standard normal
distribution.

50
Z-SCORE
 The z-score equals an X minus the population
mean (μ) all divided by the standard deviation
(σ).

51
Standard Normal Distribution
 PDF :

where -∞ < x < +∞


 Expected value or Mean:
E(x)=0
 Variance:
Var(X)=1

52
Numerical
 The marks of students (X) in a class of 70
students follows normal distribution with
mean 50 units and variance 225 units. Find the
probability that P(40 < X< 60).

53
Numerical
 Solution- Normal Distribution
Mean (μ) of 50 units
Variance (σ^2) of 225 units,
Standardize the distribution using the Z-score
So, to find the probability P(40 < X < 60), first
find the Z-score for X = 40 and X = 60:
Z1 = (40 - 50) / 15 = -0.67
Z2 = (60 - 50) / 15 = 0.67

54
Numerical
 Solution-
Using a calculator, the probability of Z being
between -0.67 and 0.67.
P(-0.67 < Z < 0.67) = 0.7486 – 0.2514
= 0.4970

55
Student t-Test Distribution
 Small sample size approximation of a
normal distribution.
 It is also known as the ‘t’ distribution.
 Similar to the standard normal distribution
with its bell shape but has heavier tails.
 The shape of the t-distribution depends on the
degrees of freedom ‘n’, which is equal to the
sample size ‘k’ minus one.
 Degree of freedom ‘n’ = k-1
56
Student t-Test Distribution
 Example-
 Suppose we deal with the total apples sold by a
shopkeeper in a month.
 In that case, we will use the normal
distribution.
 Whereas, if we are dealing with the total
amount of apples sold in a day, i.e., a smaller
sample, we can use the ‘t’ distribution.

57
Student t-Test Distribution
 As the sample size increases, the t-distribution
approaches the normal distribution, and, the
t-distribution can be used for larger sample
sizes as well.

58
Student t-Test Distribution
 PDF =

 n= degree of freedom
 Γ is the gamma function, which is a
generalization of the factorial function to
complex numbers
59
Student t-Test Distribution
 Expected value or Mean
E(x) = 0
 Variance
Var(x)= n/(n-2)

n= degree of freedom

60
Exponential Distribution
 It models elapsed time between two events.
 It is concerned with the amount of time until
some specific event occurs.

61
Exponential Distribution
 Example-
 How long do we need to wait before a
customer enters a shop?
 How long will it take before a call center
receives the next phone call?
 How long will a piece of machinery work
without breaking down?

62
Exponential Distribution
 All these questions concern the time we need
to wait before a given event occurs.
 If the waiting time is unknown, it is often
appropriate to think of it as a random variable
having an exponential distribution.

63
Exponential Distribution
 PDF of Exponential Distribution:

‘x’ is the time between events


λ >0, is the rate parameter and it is inversely
proportional to expected duration (μ)

64
Exponential Distribution
 The CDF of the exponential distribution gives
the probability that the time between events is
less than or equal to a specific value x.
 CDF of Exponential Distribution:

F(x; λ) = P(X ≤ x) = 1 - e^(-λx) for x ≥ 0

‘x’ is the time between events


λ >0 is the rate parameter and it is inversely
proportional to expected duration (μ)
65
Exponential Distribution
 Expected value or Mean of Exponential
Distribution-
E(X) = 1/λ
 Variance-
Var(X) = 1/λ²

66
Exponential Distribution
 Let's say find the probability that the time
between events is less than or equal to ‘1’
minute if λ = 10 events per hour.

67
Exponential Distribution
 Solution- P(X ≤ 1), λ = 10
Convert 1 minute into hour
Using the CDF of the exponential distribution:
= 1 - e^(-10*0.0167)
= 1 - e^(-0.167)
= 0.15

68
Exponential Distribution
 The time (in hours) required to repair a
machine is an exponentially distributed
random variable with parameter λ = 1/2. What
is the probability that a repair time exceeds 2
hours?

69
Exponential Distribution
 Solution- λ = ½, P(X ≥ 2)
Complement rule
P(X ≥ x) = 1- P(X ≤ x)
= 1- [1 - e^(-λx)]
= e^(-λx)
P(X ≥ 2) = 1- P(X ≤ 2)
= e^(-λx)
= e^(-1/2 * 2)
= 0.367
70
Numerical
 The length of life of an instrument produced
by a machine has a normal distribution with a
mean of 12 months and standard deviation of
2 months. Find the probability that an
instrument produced by this machine will last
less than 7 months.

71
Numerical
 Solution-
 X is the value to standardize, X = 7 months
 μ is the mean, μ = 12 months
 σ is the standard deviation, σ = 2 months
 Substituting the given values, in the Z-score
formula
 z = (7 - 12) / 2 = -2.5
 =0.0062 or 0.62%

72
Exponential Distribution
 Suppose that the time between machine
breakdowns at a factory follows an exponential
distribution with a mean of 10 hours. Calculate
the probability that the time between
breakdowns is between 5 and 10 hours.

73
Exponential Distribution
Find P(5 ≤ X ≤ 10)
 Use Interval rule-
 Probability of being inside the interval is
complement of being outside the interval.
 The probability of being outside the
interval is the composite event of being too
low P(X ≤ 5) for the interval and being too
high P(X ≥ 10) for the interval.
 P(5 ≤ X ≤ 10) = 1- [P(X ≤ 5) + P(X ≥ 10)]

74
Exponential Distribution
Compute P(5 ≤ X ≤ 10) with λ = 1/10
P(5 ≤ X ≤ 10)= 1- [P(X ≤ 5) + P(X ≥ 10)]
Too low P(X ≤ 5) =0.3934
Too high P(X ≥ 10)=0.3678
Outside= [P(X ≤ 5) + P(X ≥ 10)] =0.7612
Inside= P(5 ≤ X ≤ 10) =1-0.7612=0.2388

75

You might also like