0% found this document useful (0 votes)

28 views

CL202: Introduction To Data Analysis: MB+SCP

This document provides an introduction to random variables and probability distributions. It defines key concepts such as: - Random variables allow experiments with uncertain outcomes to be analyzed using numerical values and probability. - The cumulative distribution function (CDF) gives the probability that a random variable is less than or equal to a value. - For discrete random variables, the probability mass function (PMF) gives the probability of obtaining each possible value. - For continuous random variables, the probability density function (PDF) defines the relative likelihood of obtaining values. - These probability distributions fully characterize random variables and allow calculating any probability regarding their values.

Uploaded by

SWAPNIL GUDMALWAR PGP 2020 Batch

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views

CL202: Introduction To Data Analysis: MB+SCP

Uploaded by

SWAPNIL GUDMALWAR PGP 2020 Batch

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

CL202: Introduction to Data Analysis

MB+SCP

Mani Bhushan, Sachin Patwardhan

Department of Chemical Engineering,
Indian Institute of Technology Bombay
Mumbai, India- 400076

mbhushan,[email protected]

Acknowledgements: Santosh Noronha (most of the material from his slides)

Spring 2015

MB+SCP (IIT Bombay) CL202 Spring 2015 1 / 35

CL202: Taking Stock

Today’s Lecture:
Probability (Chapter 3) completed.
Chapter 4 of textbook.
Random variables.

MB+SCP (IIT Bombay) CL202 Spring 2015 2 / 35

Samples and Populations

Sampled statistics are our estimates of population parameters.

Given data, we calculate a statistic to describe the central value, the spread,
asymmetry in the distribution etc.
An observed data point is an outcome of a random experiment.

MB+SCP (IIT Bombay) CL202 Spring 2015 3 / 35

What characterizes a random experiment?

Think of the coin toss as an experiment.

All possible discrete outcomes must be known in advance.
The outcome of a particular trial is not known in advance, and
The experiment can be repeated under identical conditions.

MB+SCP (IIT Bombay) CL202 Spring 2015 4 / 35

Sample spaces

A sample space S is associated with a random experiment and is the set of

all outcomes
I S for a coin toss is {H, T }
I S for a die roll is {1, 2, 3, 4, 5, 6}
I S for the value of a concentration in a kinetics experiment
is (0, ∞)
An event E is a subset of S, and we are usually interested in whether E
happens or not.
I E = odd value on roll of a die
I E = H on tossing a coin
We have seen various counting tricks, conditional probability, Bayes rule,
total probability law.

MB+SCP (IIT Bombay) CL202 Spring 2015 5 / 35

Random Variables

The random variable X is a function that assigns a real number to each

outcome in the sample space of a random experiment:

X :S →R

The sequence HHTH of coin tosses is not a random variable. It must have a
numerical mapping.
Random variable denoted by uppercase letter such as X .
When an experiment is performed, the value obtained by the random variable
is denoted by a lowercase letter such as x (x = 6 feet).

MB+SCP (IIT Bombay) CL202 Spring 2015 6 / 35

Random Variables (Cont.)

You perform an experiment. Let the outcome be denoted by ω. For example

for tossing two coins simultaneously:
ω1 = {H, H}, ω2 = {H, T }, ω3 = {T , H}, ω4 = {T , T }.
Random variable X is a function which assigns a number to ω. For two coin
toss experiment:
I ωi given above i = 1, 2, 3, 4.
I Case 1: X (ω1 ) = 1, X (ω2 ) = 2, X (ω3 ) = 3, X (ω4 ) = 4 is a valid random
variable mapping.
I Case 2: X (ω1 ) = −1, X (ω2 ) = −2, X (ω3 ) = −3, X (ω4 ) = −4 is also a valid
random variable mapping.
Random variable mapping is not unique.
As a general rule: assigning different numbers to different outcomes is a valid
mapping.
Statements on probability of experimental outcomes are unique irrespective
of the chosen random variable mapping.
Random variables allow us to work in R.

MB+SCP (IIT Bombay) CL202 Spring 2015 7 / 35

Random variables

Often, the sample space S consists of numbers only. Example: measuring

temperature of a reactor.
Random variable function could then simply be the Identity mapping.
Random variable can take discrete or continuous values.
I Sum of two rolls of a die ranges from 2 − 12
I The average height in this room ranges from 0 − ∞.
I Discrete random variable: random variable with a finite (or countably infinite)
range.
I Continuous random variable: random variable with an interval (either finite or
infinite) of real numbers for its range.

MB+SCP (IIT Bombay) CL202 Spring 2015 8 / 35

What’s the big deal with Random Variables?

Random variables allow us to work in R.

Probability statements can now be made about Random Variables
“What is the probability that X exceeds 1” or P{X > 1}?
This is the probability of all those outcomes for which X takes a value greater
than 1, or,
P{ω : X (ω) > 1}.
Two coin toss experiment: If X was defined on individual outcomes ωi as in
Case 1 earlier, then P{X > 1} = P{ω2 , ω3 , ω4 } = 3/4
which happens to be the probability of not obtaining two heads.
Similarly, P{X = 1} = P{ω1 } = 1/4. , and P{X = 5} = P{φ} = 0.
Axioms of probability will be satisfied by random vars as well.

MB+SCP (IIT Bombay) CL202 Spring 2015 9 / 35

Random Variables

Measures such as ‘mean’ and ‘variance’ can be associated with a random

variable.
A function of a random variable defines another random variable.
We can peform several mathematical operations (differentiation, integration)
now that we are operating in R and not S.

MB+SCP (IIT Bombay) CL202 Spring 2015 10 / 35

Cumulative Distribution Function

Associate probability functions with random variables.

Cumulative Distribution Function (or distribution function) F of random
variable X is defined for any real number x by

F (x) = P{X ≤ x} , P{ω : X (ω) ≤ x}

Valid for both discrete and continuous random variables X .

All probability questions about X can be answered in terms of F .
Find P{a < X ≤ b}:
I Translate the problem in terms of cumulative distribution function F .
I Note event {X ≤ b} is union of mutually exclusive events {X ≤ a} and
{a < X ≤ b}.

P{X ≤ b} = P{X ≤ a} + P{a < X ≤ b}

or P{a < X ≤ b} = F (b) − F (a).

MB+SCP (IIT Bombay) CL202 Spring 2015 11 / 35

Example (4.1c)

A random variable X has the distribution function:

0 x ≤0
F (x) =
1 − exp{−x 2 } x > 0

What is the probability that X exceeds 1.

Soln:
P{X > 1} = 1 − P{X ≤ 1} = 1 − F (1) = e −1 = 0.368

MB+SCP (IIT Bombay) CL202 Spring 2015 12 / 35

Probability Mass Function

For a discrete random variable X : probability mass function p(x) of X is defined

as:
p(x) = P{X = x}

p(x) is positive for at most a countable number of values of x. If X assumes

one of the values x1 , x2 , ... then

p(xi ) > 0, i = 1, 2, ...

p(x) = 0, all other values of x
P∞
Obviously: i=1 p(xi ) = 1.
The probability mass function identifies the fraction of outcomes that would
be allocated to a specific value.
Consider a series of 10 coin tosses where the outcome = 4 heads.
The probability mass function can tell us how often 4 heads crop up when
the 10 coin toss experiment is repeated N times (N a large number).

MB+SCP (IIT Bombay) CL202 Spring 2015 13 / 35

Example: Probability Mass Function

p(1)=1/2, p(2)=1/3, p(3)=1/6

MB+SCP (IIT Bombay) CL202 Spring 2015 14 / 35

Cumulative Distribution Function and Probability Mass
Function
Relation is:
X
F (a) = p(x)
all x≤a
For p(x) on previous slide: F (x) is


 0 x <1
 1
2
1≤x <2
F (x) = 5

 6
2≤x <3
1 3≤x


MB+SCP (IIT Bombay) CL202 Spring 2015 15 / 35

Probability Density Function
For a continuous random variable X :
probability density function f (x) is a nonnegative function, defined for all real
x ∈ (−∞, ∞), with the property that:
Z
P{X ∈ B} = f (x)dx
B

where B is any set of real numbers.

For B = [a, b]
Z b
P{a ≤ X ≤ b} = f (x)dx
a
i.e. area under the curve f (x) from a to b.

MB+SCP (IIT Bombay) CL202 Spring 2015 16 / 35

R∞
For a = −∞, b = ∞, −∞ f (x)dx = 1. i.e. area under f (x) = 1.
Ra
For a = b, a f (x)dx = 0; probability that a continuous random variable will
assume any particular value is 0.

MB+SCP (IIT Bombay) CL202 Spring 2015 17 / 35

Probability Density Function

PDF of X is large for high probability and low for low probability.
For a single value of X
Z a
P(X = a) = f (x)dx = 0
a

Therefore endpoints do not matter!

P(a ≤ X ≤ b) = P(a < X < b) = P(a ≤ X < b) = P(a < X ≤ b)

f (x) is not the probability of an event:

it can even be greater than one.

MB+SCP (IIT Bombay) CL202 Spring 2015 18 / 35

Probability Density Function

For f (x) (or fX ) to be a valid PDF, it must be

Non-negative for all x, f (x) ≥ 0 for all x.
R∞
f (x)dx = P(−∞ < X < ∞) = 1.
R−∞
A
f (x)dx should represent P{X ∈ A} for any real set A. In particular,
Z a+/2
n o
P a− ≤X ≤a+ = f (x)dx ≈ f (a)
2 2 a−/2

for small .
i.e. f (a) is a measure of how likely it is that the random variable will take
values in a small neighbourhood at a.

MB+SCP (IIT Bombay) CL202 Spring 2015 19 / 35

Relation between probability density function and
cumulative density function

Z a
F (a) = P{X ∈ (−∞, a]} = f (x)dx
−∞

Differentiating both sides gives:

d
F (a) = f (a)
da
i.e. density function is derivative of cumulative distribution function

MB+SCP (IIT Bombay) CL202 Spring 2015 20 / 35

Example 4.2b

C (4x − 2x 2 ),

0<x <2
f (x) =
0, otherwise
Then, R2
(i) C = 3/8 from 0 f (x)dx = 1.
R∞
(ii) P{X > 1} = 1 f (x)dx = 1/2.

MB+SCP (IIT Bombay) CL202 Spring 2015 21 / 35

Extension of Ideas:

Multiple Random Variables: Jointly distributed random variables

MB+SCP (IIT Bombay) CL202 Spring 2015 22 / 35

Jointly distributed random variables

Often interested in relationships between two or more random variables:

Average number of cigarettes smoked daily and the age at which an
individual gets cancer,
Height and weight of an individual,
Height and IQ of an individual.
Flow-rate and pressure drop of a liquid flowing through a pipe.

MB+SCP (IIT Bombay) CL202 Spring 2015 23 / 35

Joint Cumulative Probability Distribution Function

For random variables (discrete or continuous) X , Y , the joint cumulative

probability distribution function of X and Y is:

F (x, y ) = P{X ≤ x, Y ≤ y }

Can compute the probability of any statement concerning the values of X and Y.

MB+SCP (IIT Bombay) CL202 Spring 2015 24 / 35

Cumulative Distribution Function of Individual Random
Variables

Extract FX (x) from F (x, y ) as:

FX (x) = P{X ≤ x} = P{X ≤ x, Y < ∞} = F (x, ∞)

Marginal cumulative distribution function of X .

Similarly, extract FY (y ) as:

FY (y ) = P{Y ≤ y } = P{X < ∞, Y ≤ y } = F (∞, y )

MB+SCP (IIT Bombay) CL202 Spring 2015 25 / 35

Joint Probability Mass Function (PMF)

Given two discrete random variables X and Y in the same experiment, the
joint PMF of X and Y is

p(xi , yj ) = P(X = xi , Y = yj )

for all pairs of (xi , yj ) values that X and Y can take.

p(xi , yj ) also denoted as pX ,Y (xi , yj ).
The univariate probabilities for X and Y are
X
pX (x) = P(X = x) = pX ,Y (x, y )
y
X
pY (y ) = P(Y = y ) = pX ,Y (x, y )
x

pX (x) and pY (y ) are called marginal PMFs.

MB+SCP (IIT Bombay) CL202 Spring 2015 26 / 35

Computation of Marginal PMF from Joint PMF

Formally:
[
{X = xi } = {X = xi , Y = yj }
j

All events on RHS are mutually exclusive. Thus,

X X
pX (xi ) = P{X = xi } = P{X = xi , Y = yj } = p(xi , yj )
j j

Similarly, P
pY (yj ) = P{Y = yj } = p(xi , yj ).
i

Note: P{X = xi , Y = yj } cannot be constructed from knowledge of P{X = xi }

and P{Y = yj }.

MB+SCP (IIT Bombay) CL202 Spring 2015 27 / 35

Idea can be extended to more than two variables

For three discrete random variables:

X
pX ,Y (xi , yj ) = P(X = xi , Y = yj ) = pX ,Y ,Z (X = xi , Y = yj , Z = zk )
k

XX
pX (xi ) = P(X = xi ) = pX ,Y ,Z (X = xi , Y = yj , Z = zk )
j k

MB+SCP (IIT Bombay) CL202 Spring 2015 28 / 35

Example: 4.3a

3 batteries are randomly chosen from a group of 3 new, 4 used but still working,
and 5 defective batteries. Let X , Y denote the number of new and used but
working batteries that are chosen respectively. Find
p(xi , yj ) = P{X = xi , Y = yj }.
Solution: Let T =12 C3
p(0, 0) = (5 C3 )/T
p(0, 1) = (4 C1 )(5 C2 )/T
p(0, 2) = (4 C2 )(5 C1 )/T
p(0, 3) = (4 C3 )/T
p(1, 0) = (3 C1 )(5 C2 )/T
p(1, 1) = (3 C1 )(4 C1 )(5 C1 )/T
p(1, 2) = ...
p(2, 0) = ...
p(2, 1) = ...
p(3, 0) = ...

MB+SCP (IIT Bombay) CL202 Spring 2015 29 / 35

Tabular Form

0 1 2 3 Row Sum
(P{X = i})
0 10/220 40/220 30/220 4/220 84/220
1 30/220 60/220 18/220 0 108/220
2 15/220 12/220 0 0 27/220
3 1/220 0 0 0 1/220
Col sum 56/220 112/220 48/220 4/220
(P{Y = j})

i represents row and j represents column:

Both row and column sums add upto 1.
Marginal probabilities in the margins of the table.

MB+SCP (IIT Bombay) CL202 Spring 2015 30 / 35

Continuous Random Variables

Random variables X , Y .
Joint probability density f (x, y ) is a function defined for all real x and y and has
the property that for every set C of pairs of real numbers (i.e. C is a set in
two-dimensional plane):
Z Z
P{(X , Y ) ∈ C } = f (x, y )dxdy
(x,y )∈C

X,Y said to be jointly continuous and f (x, y ) is the joint probability density
function of X , Y .

MB+SCP (IIT Bombay) CL202 Spring 2015 31 / 35

Joint Probability Distribution Function

Z b Z a
F (a, b) = P{X ∈ (−∞, a], Y ∈ (−∞, b]} = f (x, y )dxdy
−∞ −∞

Thus,
∂2
f (a, b) = F (a, b)
∂a∂b
wherever the partial derivatives exist.
Interpretation of joint density function: for small , δ

P{a − /2 < X < a + /2, b − δ/2 < Y < b + δ/2} =

Z b+δ/2 Z a+/2
f (x, y )dxdy ≈ f (a, b)δ
b−δ/2 a−/2

f (a, b) a measure of how likely it is that random vector (X,Y) will be near
(a,b).

MB+SCP (IIT Bombay) CL202 Spring 2015 32 / 35

Marginal Probability Density

Z ∞
fX (x) = f (x, y )dy
−∞

since

P{X ∈ A} = P{X ∈ A, Y ∈ (−∞, ∞)}

Z Z ∞
= f (x, y )dydx
ZA −∞
= fX (x)dx
A

where fX (x) is as defined above.

Similarly, Z ∞
fY (y ) = f (x, y )dx
−∞

MB+SCP (IIT Bombay) CL202 Spring 2015 33 / 35

Example 4.3c from book

2e −x e −2y

0 < x < ∞, 0 < y < ∞
f (x, y ) =
0 otherwise
Compute: (a) P{X > 1, Y < 1}, (b) P{X < Y }, (c) P{X < a}
(a)
Z 1 Z ∞
P{X > 1, Y < 1} = 2e −x e −2y dxdy
0 1
Z 1
= 2e −2y (−e −x |∞
1 )dy
0
Z 1
= e −1 2e −2y dy
0
= e −1 (1 − e −2 )

MB+SCP (IIT Bombay) CL202 Spring 2015 34 / 35

THANK YOU

MB+SCP (IIT Bombay) CL202 Spring 2015 35 / 35

The Ultimate Probability Cheatsheet
No ratings yet
The Ultimate Probability Cheatsheet
8 pages
Selection, Verification and Validation of Methods
No ratings yet
Selection, Verification and Validation of Methods
47 pages
Biostatistics and Research Methods
100% (1)
Biostatistics and Research Methods
2 pages
CL202: Introduction To Data Analysis: MB+SCP
No ratings yet
CL202: Introduction To Data Analysis: MB+SCP
37 pages
CL202: Introduction To Data Analysis: MB+SCP
No ratings yet
CL202: Introduction To Data Analysis: MB+SCP
33 pages
Lecture 3 Prob&Statistics
No ratings yet
Lecture 3 Prob&Statistics
50 pages
CL 202 Multiplerandomvars
No ratings yet
CL 202 Multiplerandomvars
45 pages
Unit-1-Single Random Variable
No ratings yet
Unit-1-Single Random Variable
64 pages
MAS 102_Topic 1
No ratings yet
MAS 102_Topic 1
13 pages
Module 5
No ratings yet
Module 5
42 pages
CL202: Introduction To Data Analysis
No ratings yet
CL202: Introduction To Data Analysis
50 pages
S1) Basic Probability Review
No ratings yet
S1) Basic Probability Review
71 pages
Prepared By: Mohammad Saifuddin: Discrete or Continuous
No ratings yet
Prepared By: Mohammad Saifuddin: Discrete or Continuous
7 pages
Ma2261 Probability And Random Processes: ω: X (ω) ≤ x x ∈ R
No ratings yet
Ma2261 Probability And Random Processes: ω: X (ω) ≤ x x ∈ R
17 pages
Module_1__Topic_1
No ratings yet
Module_1__Topic_1
30 pages
Chapter 3: Random Variables and Probability Distributions
No ratings yet
Chapter 3: Random Variables and Probability Distributions
54 pages
Main Parameterestimation PDF
No ratings yet
Main Parameterestimation PDF
73 pages
Math 243-Final Exam (Summer-2020)
No ratings yet
Math 243-Final Exam (Summer-2020)
3 pages
Tutorial 4
No ratings yet
Tutorial 4
2 pages
Unit 2a
No ratings yet
Unit 2a
18 pages
Binomial and Hypergeometric PDF
No ratings yet
Binomial and Hypergeometric PDF
12 pages
Lecture 16 Random Variable
No ratings yet
Lecture 16 Random Variable
4 pages
WINSEM2024-25_MAT1011_ETH_AP2024254000664_2025-01-21_Reference-Material-I (1)
No ratings yet
WINSEM2024-25_MAT1011_ETH_AP2024254000664_2025-01-21_Reference-Material-I (1)
27 pages
UNIT II Probability Theory
No ratings yet
UNIT II Probability Theory
84 pages
Ch03a - Discrete Distribution Overview
No ratings yet
Ch03a - Discrete Distribution Overview
18 pages
Unit 2 Ma 202
No ratings yet
Unit 2 Ma 202
13 pages
Assignment06 2024
No ratings yet
Assignment06 2024
3 pages
Chapter 5
No ratings yet
Chapter 5
21 pages
A3 - Random Variables and Distributions
100% (1)
A3 - Random Variables and Distributions
19 pages
Random Variables
No ratings yet
Random Variables
11 pages
MA1201-Probability-Notes
No ratings yet
MA1201-Probability-Notes
30 pages
Math 5846 Chapter 2
No ratings yet
Math 5846 Chapter 2
102 pages
Basic Probability Review
No ratings yet
Basic Probability Review
77 pages
Random Variable and Mathematical Expectation
No ratings yet
Random Variable and Mathematical Expectation
31 pages
Comm ch02 Random en PDF
No ratings yet
Comm ch02 Random en PDF
84 pages
Chapter 3: Probability: Experiment
No ratings yet
Chapter 3: Probability: Experiment
12 pages
Continuous Random Variables and Probability Distributions: Institute of Technology of Cambodia
No ratings yet
Continuous Random Variables and Probability Distributions: Institute of Technology of Cambodia
34 pages
Probability Basics
No ratings yet
Probability Basics
19 pages
M-Unit 02 Notes
No ratings yet
M-Unit 02 Notes
13 pages
확변 연습문제
No ratings yet
확변 연습문제
21 pages
1.Stochastic Process Edited - Final
No ratings yet
1.Stochastic Process Edited - Final
90 pages
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
No ratings yet
10-701/15-781, Machine Learning: Homework 1: Aarti Singh Carnegie Mellon University
6 pages
CPSC 531 Systems Modeling and Simulation: Review
No ratings yet
CPSC 531 Systems Modeling and Simulation: Review
50 pages
ST104B_prelim_2025
No ratings yet
ST104B_prelim_2025
11 pages
Ch1 Random Variables and Probability Distributions 0
No ratings yet
Ch1 Random Variables and Probability Distributions 0
25 pages
Session3 PSQT DKJ
No ratings yet
Session3 PSQT DKJ
83 pages
Chapter 6
No ratings yet
Chapter 6
50 pages
RandomVari_2
No ratings yet
RandomVari_2
3 pages
ITP3902_DMS_Lec_10_Random_Variables
No ratings yet
ITP3902_DMS_Lec_10_Random_Variables
24 pages
CS115 Probability 2
No ratings yet
CS115 Probability 2
58 pages
Mca4020 SLM Unit 02
No ratings yet
Mca4020 SLM Unit 02
27 pages
Tutsheet 2
No ratings yet
Tutsheet 2
4 pages
Discrete Prob Distributions
No ratings yet
Discrete Prob Distributions
19 pages
13 Independent Random Variables
No ratings yet
13 Independent Random Variables
34 pages
MATH F113 - Chapter-3 PDF
No ratings yet
MATH F113 - Chapter-3 PDF
136 pages
Statistical Models: Modeling and Simulation
No ratings yet
Statistical Models: Modeling and Simulation
51 pages
Chapter 4 One Random Variable: Cumulative Distribution Function
No ratings yet
Chapter 4 One Random Variable: Cumulative Distribution Function
31 pages
CS 229 Autumn 2016 Problem Set#3:Theory & Unsupervised Learning
No ratings yet
CS 229 Autumn 2016 Problem Set#3:Theory & Unsupervised Learning
5 pages
Chapter 6-1
No ratings yet
Chapter 6-1
7 pages
UNITIIProbability DFTheory by DR NVNagendram
No ratings yet
UNITIIProbability DFTheory by DR NVNagendram
86 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Crisis Management: Facebook - Cambridge Analytica Crisis
No ratings yet
Crisis Management: Facebook - Cambridge Analytica Crisis
15 pages
CL202: Introduction To Data Analysis: MB, SCP
No ratings yet
CL202: Introduction To Data Analysis: MB, SCP
35 pages
CL 202: Introduction To Data Analysis Supplementary Note 3: Examples of Multivariate Random Variables
No ratings yet
CL 202: Introduction To Data Analysis Supplementary Note 3: Examples of Multivariate Random Variables
4 pages
CL202: Introduction To Data Analysis: Mani Bhushan, Sachin Patwardhan
No ratings yet
CL202: Introduction To Data Analysis: Mani Bhushan, Sachin Patwardhan
17 pages
Eapp Second Quarterpdf 1
No ratings yet
Eapp Second Quarterpdf 1
16 pages
Research Week 7 8
No ratings yet
Research Week 7 8
2 pages
Phyllis G. Supino EdD (Auth.), Phyllis G. Supino, Jeffrey S. Borer (Eds.) - Principles of Research Methodology - A Guide For Clinical Investigators-Springer-Verlag New York (2012)
100% (2)
Phyllis G. Supino EdD (Auth.), Phyllis G. Supino, Jeffrey S. Borer (Eds.) - Principles of Research Methodology - A Guide For Clinical Investigators-Springer-Verlag New York (2012)
286 pages
Test Item Analysis
No ratings yet
Test Item Analysis
31 pages
SarahIsniah HumirasHardiPurba 2021
No ratings yet
SarahIsniah HumirasHardiPurba 2021
10 pages
AS lecture 09 (t-test)
No ratings yet
AS lecture 09 (t-test)
31 pages
Student Solutions Manual to Accompany an Introduction to Econometrics a Self Contained Approach 1st Edition Frank Westhoff - The ebook in PDF format is ready for immediate access
100% (1)
Student Solutions Manual to Accompany an Introduction to Econometrics a Self Contained Approach 1st Edition Frank Westhoff - The ebook in PDF format is ready for immediate access
74 pages
Sample Size Calculation
No ratings yet
Sample Size Calculation
6 pages
Quantum Mechanics - Engineering Physics
No ratings yet
Quantum Mechanics - Engineering Physics
17 pages
Name
No ratings yet
Name
3 pages
RSCH 1
No ratings yet
RSCH 1
72 pages
Detection of Repetitive Forex Chart Patterns
No ratings yet
Detection of Repetitive Forex Chart Patterns
8 pages
PPT Mod 2 Research Process
No ratings yet
PPT Mod 2 Research Process
23 pages
Mark Book
No ratings yet
Mark Book
849 pages
Etalon Scala Intimitate
No ratings yet
Etalon Scala Intimitate
5 pages
Chapter 08 Sampling Distributions and Estimation: Multiple Choice Questions
No ratings yet
Chapter 08 Sampling Distributions and Estimation: Multiple Choice Questions
302 pages
Received 16 November 2016 Accepted 9 May 2017
No ratings yet
Received 16 November 2016 Accepted 9 May 2017
14 pages
Chapter 2 Basic Building Block
No ratings yet
Chapter 2 Basic Building Block
17 pages
Historical Background of Statistics
No ratings yet
Historical Background of Statistics
10 pages
SB Quiz 2
No ratings yet
SB Quiz 2
3 pages
Chapter III
No ratings yet
Chapter III
7 pages
Biology Syllabus GD 9 & 10 - 2024-2025 - Approved
No ratings yet
Biology Syllabus GD 9 & 10 - 2024-2025 - Approved
53 pages
Research Design Worksheet: Interdisciplinary Research Methods Course
No ratings yet
Research Design Worksheet: Interdisciplinary Research Methods Course
2 pages
Review Mixed Method Research - Student
No ratings yet
Review Mixed Method Research - Student
35 pages
Draft Chapter 3 4
No ratings yet
Draft Chapter 3 4
23 pages
Time Series Using Python
No ratings yet
Time Series Using Python
18 pages
Research Methods - STA630 Quiz 2
No ratings yet
Research Methods - STA630 Quiz 2
196 pages
1-Exploring Random Variables
100% (2)
1-Exploring Random Variables
15 pages

CL202: Introduction To Data Analysis: MB+SCP

Uploaded by

CL202: Introduction To Data Analysis: MB+SCP

Uploaded by

CL202: Introduction to Data Analysis

Mani Bhushan, Sachin Patwardhan

Acknowledgements: Santosh Noronha (most of the material from his slides)

MB+SCP (IIT Bombay) CL202 Spring 2015 1 / 35

MB+SCP (IIT Bombay) CL202 Spring 2015 2 / 35

Sampled statistics are our estimates of population parameters.

MB+SCP (IIT Bombay) CL202 Spring 2015 3 / 35

Think of the coin toss as an experiment.

MB+SCP (IIT Bombay) CL202 Spring 2015 4 / 35

A sample space S is associated with a random experiment and is the set of

MB+SCP (IIT Bombay) CL202 Spring 2015 5 / 35

The random variable X is a function that assigns a real number to each

MB+SCP (IIT Bombay) CL202 Spring 2015 6 / 35

You perform an experiment. Let the outcome be denoted by ω. For example

MB+SCP (IIT Bombay) CL202 Spring 2015 7 / 35

Often, the sample space S consists of numbers only. Example: measuring

MB+SCP (IIT Bombay) CL202 Spring 2015 8 / 35

Random variables allow us to work in R.

MB+SCP (IIT Bombay) CL202 Spring 2015 9 / 35

Measures such as ‘mean’ and ‘variance’ can be associated with a random

MB+SCP (IIT Bombay) CL202 Spring 2015 10 / 35

Associate probability functions with random variables.

F (x) = P{X ≤ x} , P{ω : X (ω) ≤ x}

Valid for both discrete and continuous random variables X .

P{X ≤ b} = P{X ≤ a} + P{a < X ≤ b}

or P{a < X ≤ b} = F (b) − F (a).

MB+SCP (IIT Bombay) CL202 Spring 2015 11 / 35

A random variable X has the distribution function:

What is the probability that X exceeds 1.

MB+SCP (IIT Bombay) CL202 Spring 2015 12 / 35

For a discrete random variable X : probability mass function p(x) of X is defined

p(x) is positive for at most a countable number of values of x. If X assumes

p(xi ) > 0, i = 1, 2, ...

MB+SCP (IIT Bombay) CL202 Spring 2015 13 / 35

p(1)=1/2, p(2)=1/3, p(3)=1/6

MB+SCP (IIT Bombay) CL202 Spring 2015 14 / 35

MB+SCP (IIT Bombay) CL202 Spring 2015 15 / 35

where B is any set of real numbers.

MB+SCP (IIT Bombay) CL202 Spring 2015 16 / 35

MB+SCP (IIT Bombay) CL202 Spring 2015 17 / 35

Therefore endpoints do not matter!

P(a ≤ X ≤ b) = P(a < X < b) = P(a ≤ X < b) = P(a < X ≤ b)

f (x) is not the probability of an event:

MB+SCP (IIT Bombay) CL202 Spring 2015 18 / 35

For f (x) (or fX ) to be a valid PDF, it must be

MB+SCP (IIT Bombay) CL202 Spring 2015 19 / 35

Differentiating both sides gives:

MB+SCP (IIT Bombay) CL202 Spring 2015 20 / 35

MB+SCP (IIT Bombay) CL202 Spring 2015 21 / 35

Multiple Random Variables: Jointly distributed random variables

MB+SCP (IIT Bombay) CL202 Spring 2015 22 / 35

Often interested in relationships between two or more random variables:

MB+SCP (IIT Bombay) CL202 Spring 2015 23 / 35

For random variables (discrete or continuous) X , Y , the joint cumulative

MB+SCP (IIT Bombay) CL202 Spring 2015 24 / 35

Extract FX (x) from F (x, y ) as:

FX (x) = P{X ≤ x} = P{X ≤ x, Y < ∞} = F (x, ∞)

Marginal cumulative distribution function of X .

FY (y ) = P{Y ≤ y } = P{X < ∞, Y ≤ y } = F (∞, y )

MB+SCP (IIT Bombay) CL202 Spring 2015 25 / 35

for all pairs of (xi , yj ) values that X and Y can take.

pX (x) and pY (y ) are called marginal PMFs.

MB+SCP (IIT Bombay) CL202 Spring 2015 26 / 35

All events on RHS are mutually exclusive. Thus,

Note: P{X = xi , Y = yj } cannot be constructed from knowledge of P{X = xi }

MB+SCP (IIT Bombay) CL202 Spring 2015 27 / 35

For three discrete random variables:

MB+SCP (IIT Bombay) CL202 Spring 2015 28 / 35

MB+SCP (IIT Bombay) CL202 Spring 2015 29 / 35

i represents row and j represents column:

MB+SCP (IIT Bombay) CL202 Spring 2015 30 / 35

MB+SCP (IIT Bombay) CL202 Spring 2015 31 / 35

P{a − /2 < X < a + /2, b − δ/2 < Y < b + δ/2} =

MB+SCP (IIT Bombay) CL202 Spring 2015 32 / 35

P{X ∈ A} = P{X ∈ A, Y ∈ (−∞, ∞)}

where fX (x) is as defined above.

P{a − /2 < X < a + /2, b − δ/2 < Y < b + δ/2} =