0% found this document useful (0 votes)

6 views

0.1. Probability Review

The document provides a comprehensive overview of probability theory, including fundamental concepts such as sample space, events, probability measures, and random variables. It covers definitions of conditional probability, independence, cumulative distribution functions, and various types of probability functions (PMF, PDF). Additionally, it discusses expectations, variance, and the relationships between two random variables, including joint distributions and conditional distributions.

Uploaded by

9gt5rqjjnq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

0.1. Probability Review

Uploaded by

9gt5rqjjnq

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Review of Probability Theory

January 12, 2025

1 Elements of Probability
In order to define a probability on a set we need a few basic elements.
Sample space Ω: The set of all the outcomes of a random experiment. Here, each outcome ω ∈ Ω
can be thought of as a complete description of the state of the real world at the end of the
experiment.
Set of events (or event space) F: A set whose elements A ∈ F (called events) are subsets of Ω
(i.e., A ⊆ Ω is a collection of possible outcomes of an expectation)
Probability measure: A function P : F 7→ R that satisfies the following properties
– P(A) ≥ 0 for all A ∈ F
– P(Ω) = 1
P
– If A1 , A2 , . . . are disjoint events (i.e. Ai ∩ Aj = ∅ if i ̸= j), then P(∪i Ai ) = i P(Ai ).
We interpret P(A ∪ B) as the probability of A or B happening, and P(A ∩ B) as the probability of A
and B happening.
Example 1. Consider the event of tossing a six-sided die. The sample space is Ω = {1, 2, . . . , 6}. We
can define different event spaces on this sample space. For example, the simplest event space is the
trivial event space F = {∅, Ω}. Another event space is the set of all subsets of Ω. For the first event
space, the unique probability measure satisfying the requirements above is given by P(∅) = 0, P(Ω) = 1.
For the second event space, one valid probability measure is to assign the probability of each set in the
event space to be i/6, where i is the number of elements of the set; for example, P({1, 2, 3, 4}) = 4/6
and P({1, 2, 3}) = 3/6.
Definition 1 (Conditional probability). Let B be an event with non-zero probability. The conditional
probability of any event A given B is defined as

P(A|B) = P(A ∩ B)/P(B).

That is, P(A|B) is the probability measure of the event A after observing the occurrence of the event
B.
Definition 2 (Independency). We say A and B are independent if P(A ∩ B) = P(A)P(B) (or equiv-
alently, P(A|B) = P(A)). Therefore, independence is equivalent to saying that observing B does not
have any effect on the probability of A.

2 Random variables
Consider an experiment in which we flip 10 coins, and we want to know the number of coins that
come up heads. Here, the elements of the sample space Ω are 10-length sequences of heads and tails.
For example, we might have

ω = (H, H, T, H, T, H, H, T, T, T ) ∈ Ω.

1
2 RANDOM VARIABLES

However, in practice, we usually do not care about the probability of obtaining any particular sequence
of heads and tails. Instead we usually care about real-valued functions of outcomes, such as the number
of heads that appear among our 10 tosses, or the length of the longest run of tails. These functions,
under some technical conditions, are known as random variables.
More formally, a random variable X is a function X : Ω 7→ R. Typically, we will denote random
variables using upper case letters X(ω) or more simply X (where the dependence on the random
outcome ω is implicit). We will denote the value that a random variable may take on using lower case
letters x.

Example 2. In our experiment above, suppose that X(ω) is the number of heads which occur in the
sequence of tosses ω. Given that only 10 coins are tossed, X(ω) can take only a finite number of values,
so it is known as a discrete random variable. Here, the probability of the set associated with a random
variable X taking on some specific value k is

P(X = k) := P({ω : X(ω) = k}).

Example 3. Suppose that X(ω) is a random variable indicating the amount of time it takes for a
radioactive particle to decay. In this case, X(ω) takes on an infinite number of possible values, so it is
called a continuous random variable. We denote the probability that X takes on a value between two
real constants a and b (where a < b) as

P a ≤ X ≤ b := P {ω : a ≤ X(ω) ≤ b} .

2.1 Cumulative distribution functions

In order to specify the probability measures used when dealing with random variables, it is often
convenient to specify alternative functions (CDFs, PDFs, and PMFs) from which the probability
measure governing an experiment immediately follows. In this section and the next two sections, we
describe each of these types of functions in turn.

Definition 3. A cumulative distribution function (CDF) is a function FX : R 7→ [0, 1] which specifies

a probability measure as
FX (x) := P(X ≤ x).
By using this function one can calculate the probability of any event in F. According to the
definition, we know that
lim FX (x) = 0, lim FX (x) = 1.
x→−∞ x→∞

Furthermore, if x ≤ y we have FX (x) ≤ FX (y).

2.2 Probability mass functions

When a random variable X takes on a finite set of possible values (i.e., X is a discrete random
variable), a simpler way to represent the probability measure associated with a random variable is
to directly specify the probability of each value that the random variable can take. In particular, a
probability mass function (PMF) is a function pX : Ω 7→ R such that

PX (x) := P(X = x).

In the case of discrete random variable, we use the notation Val(X) for the set of possbile values that
the random variable X can take. For example, if X(ω) is a random variable indicating the number of
heads out of ten tosses of coin, then Val(X) = {0, 1, . . . , 10}. It is clear that
X
PX (x) = P(X ∈ A).
x∈A

2
2 RANDOM VARIABLES

2.3 Probability density functions

For some continuous random variables, the cumulative distribution function FX (x) is differentiable
everywhere. In these cases, we define the probability density function (PDF) as the derivative of the
CDF, i.e.,
dFX (x)
fX (x) := .
dx
According to the properties of differentiation, for very small ∆x

P x ≤ X ≤ x + ∆x ≈ fX (x)∆x.
Both CDFs and PDFs can be used for calculating the probabilities of different events. But it should
be emphasized that the value of PDF at any given point x is not the probability of the event, i.e.,
fX (x) ̸= P(X = x). For example, fX (x) can take on values larger than 1 (but the integral of fX (x)
over any subset of R will be at most one).

2.4 Expectation
Suppose that X is a discrete random variable with PMF pX (x) and g : R 7→ R is an arbitrary
function. In this case, g(X) can be considered as a random variable, and we define the expectation as
X
E[g(X)] := g(x)pX (x).
x∈Val(X)

If X is a continuous random variable with PDF fX (x), then the expected value of g(X) is defined as
Z ∞
E[g(X)] := g(x)fX (x)dx.
−∞

Intuitively, the expectation of g(X) can be thought of as a “weighted average” of the values that g(x)
can take on for different values of x, where the weights are given by pX (x) or fX (x).
Below we list some useful properties
E[af (X)] = aE[f (X)] for any constant a ∈ R.
(Linearity of expectation): E[f (X) + g(X)] = E[f (X)] + E[g(X)]
For a discrete random variable X, E[I[X = k]] = P(X = k), where I[·] is an indicator function,
taking value 1 if the event happens and 0 otherwise.

2.5 Variance
The variance of a random variable X is a measure of how concentrated the distribution of a random
variable X is around its mean. Formally, the variance of a random variable X is defined as
Var[X] := E[(X − E[X])2 ].
We can give an alternate expression for the variance
E[(X − E[X])2 ] = E[X 2 − 2E[X]X + (E[X])2 ] = E[X 2 ] − 2E[X]E[X] + (E[X])2
= E[X 2 ] − (E[X])2 .
For any constant a ∈ R, we know that Var[af (X)] = a2 Var[f (X)].
Example 4. Calculate the mean and the variance of the uniform random variable X with PDF
fX (x) = 1, ∀x ∈ [0, 1] and 0 otherwise. Then
Z ∞ Z 1
E[X] = xfX (x)dx = xdx = 1/2,
−∞ 0
Z ∞ Z 1
E[X 2 ] = x2 fX (x)dx = x2 dx = 1/3
−∞ 0
1 1 1
Var[X] = E[X 2 ] − (E[X])2 = − = .
3 4 12

3
3 TWO RANDOM VARIABLES

Example 5. Suppose that g(x) = I[x ∈ A] for some subset A ⊆ Ω. Then

Z ∞ Z
E[g(X)] = I[x ∈ A]fX (x)dx = fX (x)dx = P(A).
−∞ x∈A

3 Two Random Variables

Thus far, we have considered single random variables. In many situations, however, there may be
more than one quantity that we are interested in knowing during a random experiment. For instance,
in an experiment where we flip a coin ten times, we may care about both X(ω) = the number of heads
that come up as well as Y (ω) = the length of the longest of consecutive heads. In this section, we
consider the setting of two random variables.

3.1 Joint and marginal distributions

Suppose that we have two random variables X and Y . One way to work with these two random vari-
ables is to consider each of them separately. If we do that we will only need FX (x) and FY (y). But if we
want to know about the values that X and Y assume simultaneously during outcomes of a random ex-
periment, we require a more complicated structure known as the joint cumulative distribution function
of X and Y , defined by
FXY (x, y) = P(X ≤ x, Y ≤ y).
It can be shown that by knowing the joint cumulative distribution function, the probability of any
event involving X and Y can be calculated. The joint CDF and the joint distributions FX (x) and
FY (y) of each variable separately are related by

FX (x) = lim FXY (x, y)dy, FY (y) = lim FXY (x, y)dx.
y→∞ x→∞

3.2 Joint and marginal probability mass functions

If X and Y are discrete random variables, then the joint probability mass function pXY : R × R 7→
[0, 1] is defined by
pXY (x, y) = P(X = x, Y = y).
P P
We know that x∈Val(X) y∈Val(Y ) pXY (x, y) = 1. We also have the following relationship between
the joint PMF and individual PMF
X
pX (x) = PXY (x, y).
y∈Val(Y )

3.3 Joint and marginal probability density functions

Let X and Y be two continuous random variables with joint distribution function FXY . In the case
that FXY (x, y) is everywhere differentiable in both x and y, then we can define the joint probability
density function
∂ 2 FXY (x, y)
fXY (x, y) = .
∂x∂y
Like in the single-dimensional case, fXY (x, y) ̸= P(X = x, Y = y), but we have
ZZ
fXY (x, y)dxdy = P((X, Y ) ∈ A).
(x,y)∈A

Analogous to the discrete case, we define

Z ∞
fX (x) = fXY (x, y)dy
−∞

as the marginal probability density function of X.

4
3 TWO RANDOM VARIABLES

3.4 Conditional distributions

Conditional distributions seek to answer the question: what is the probability distribution over Y ,
when we know that X must take on a certain value x? In the discrete case, the conditional probability
mass function of X given Y is simply
PXY (x, y)
pY |X = , if pX (x) ̸= 0,
PX (x)
In the continuous case, the situation is technically a little more complicated because the probability
that a continuous random variable X takes on a specific value x is equal to 0. Ignoring this technical
point, we simply define the conditional probability density of Y given X = x to be
fXY (x, y)
fY |X (y|x) = , if fX (x) ̸= 0.
fX (x)
Example 6. Suppose we know that a dice throw was odd, and want to know the probability of an
“one” has been thrown. Let X be the random variable of the dice throw, and Y be an indicator variable
that takes on the value of 1 if the dice throw turns up odd, then we write our desired probability as
follows:
P(X = 1, Y = 1) 1/6 1
P(X = 1|Y = 1) = = = .
P(Y = 1) 1/2 3
The idea of conditional probability extends naturally to the case when the distribution of a random
variable is conditioned on several variables, namely
P(X = a, Y = b, Z = c)
P(X = a|Y = b, Z = c) = .
P(Y = b, Z = c)

3.5 Independence
Two random variables X and Y are independent if FXY (x, y) = FX (x)FY (y) for all values of x
and y. For discrete random variables, this is equivalent to saying that

pXY (x, y) = pX (x)pY (y) ⇐⇒ pY |X (y|x) = pY (y) ∀x, y.

For continuous random variables, this is equivalent to saying that

fXY (x, y) = fX (x)fY (y) ⇐⇒ fY |X (y|x) = fY (y) ∀x, y.

Informally, two random variables X and Y are independent if “knowing” the value of one variable will
never have any effect on the conditional probability distribution of the other variable. That is, you
know all the information about the pair (X, Y ) by just knowing f (x) and f (y). The following lemma
formalizes this observation.
Lemma 1. If X and Y are independent, then for any subsets A, B ⊂ R we have

P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B).

Sometimes we also talk about conditional independence, meaning that if we know the value of a
random variable (or more generally, a set of random variables), then some other random variables will
be independent of each other. Formally, we say “X and Y are conditionally independent given Z” if

pX|Z (x|z) = pX|Y,Z (x|y, z) ⇐⇒ pX,Y |Z (x, y|z) = pX|Z (x|z)pY |Z (y|z).

3.6 Expectations and covariance

Suppose that we have two discrete random variables X, Y and g : R2 7→ R is a function of these
two random variables. Then, the expected value of g is defined by
X X
E[g(X, Y )] := g(x, y)pXY (x, y).
x∈Val(X) y∈Val(Y )

5
3 TWO RANDOM VARIABLES

For continuous random variables X, Y , the analogous expression is

Z ∞Z ∞
E[g(X, Y )] = g(x, y)fXY (x, y)dxdy.
−∞ −∞

The covariance of two random variables X and Y is defined by

Cov[X, Y ] := E[(X − E[X])(Y − E[Y ])].

Using an argument similar to that for variance, we can rewrite this as

Cov[X, Y ] : = E[(X − E[X])(Y − E[Y ])] = E[XY − XE[Y ] − Y E[X] + E[X]E[Y ]]

= E[XY ] − E[X]E[Y ] − E[Y ]E[X] + E[X]E[Y ] = E[XY ] − E[X]E[Y ].

Properties
(Linearity of expectation) E[f (X, Y ) + g(X, Y )] = E[f (X, Y )] + E[g(X, Y )]

Var[X + Y ] = Var[X] + Var[Y ] + 2Cov[X, Y ]

If X and Y are independent, then

Cov(X, Y ) = 0, E[f (X)g(Y )] = E[f (X)]E[g(Y )]

Techniques For Standing Up For Yourself
100% (2)
Techniques For Standing Up For Yourself
41 pages
Plaxis Danang Course-Compiled
100% (4)
Plaxis Danang Course-Compiled
375 pages
Pipe Bends
No ratings yet
Pipe Bends
23 pages
Đề Thi Thử
51% (74)
Đề Thi Thử
242 pages
Probability Review
No ratings yet
Probability Review
12 pages
SI_Chapter-1
No ratings yet
SI_Chapter-1
30 pages
CH 3
No ratings yet
CH 3
22 pages
Chapter-3
No ratings yet
Chapter-3
26 pages
ST2133 ch 3
No ratings yet
ST2133 ch 3
80 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Probability
No ratings yet
Probability
69 pages
CHAPTER TWO (2) S
No ratings yet
CHAPTER TWO (2) S
69 pages
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
No ratings yet
Chapter 3: Random Variables and Probability Distributions This Chapter Is All About
8 pages
Econ-2042- Unit 2-HO
No ratings yet
Econ-2042- Unit 2-HO
12 pages
02 Random Variables
No ratings yet
02 Random Variables
51 pages
RM2
No ratings yet
RM2
102 pages
Lec3 Random Variables and Distributions
No ratings yet
Lec3 Random Variables and Distributions
18 pages
CHP 5
No ratings yet
CHP 5
63 pages
ProbabilityStatistics_Probability2 (1)
No ratings yet
ProbabilityStatistics_Probability2 (1)
11 pages
Random Variables
No ratings yet
Random Variables
44 pages
Binomial and Hypergeometric PDF
No ratings yet
Binomial and Hypergeometric PDF
12 pages
Probability
No ratings yet
Probability
73 pages
MEFall2023_7
No ratings yet
MEFall2023_7
46 pages
02-Random Variables
No ratings yet
02-Random Variables
62 pages
Chapter 2
No ratings yet
Chapter 2
8 pages
3 Discrete Random Variables and Probability Distributions
No ratings yet
3 Discrete Random Variables and Probability Distributions
22 pages
c2_RVs_distribution
No ratings yet
c2_RVs_distribution
48 pages
Chapter 4-6
No ratings yet
Chapter 4-6
39 pages
Lecture03 Ch 03 DiscRVs Baron Inf Stats Final FA24
No ratings yet
Lecture03 Ch 03 DiscRVs Baron Inf Stats Final FA24
92 pages
mcnotes51
No ratings yet
mcnotes51
8 pages
Chap2 Discrete Distributions
No ratings yet
Chap2 Discrete Distributions
22 pages
Chapter 3
No ratings yet
Chapter 3
35 pages
02 Random Variables SEIDTCHR
No ratings yet
02 Random Variables SEIDTCHR
44 pages
Discrete Random Variables Class 4, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
No ratings yet
Discrete Random Variables Class 4, 18.05 Jeremy Orloff and Jonathan Bloom 1 Learning Goals
13 pages
Chapter 2 Pattern
No ratings yet
Chapter 2 Pattern
21 pages
Basic Probability Review
No ratings yet
Basic Probability Review
77 pages
Ch4
No ratings yet
Ch4
71 pages
AE 248: AI and Data Science: Prabhu Ramachandran 2024-01-01
No ratings yet
AE 248: AI and Data Science: Prabhu Ramachandran 2024-01-01
12 pages
Introductory Probability and The Central Limit Theorem
No ratings yet
Introductory Probability and The Central Limit Theorem
11 pages
Learn Distribute
No ratings yet
Learn Distribute
23 pages
3and4_main
No ratings yet
3and4_main
10 pages
LEC0125 RNG Generation
No ratings yet
LEC0125 RNG Generation
7 pages
Unit-1-Single Random Variable
No ratings yet
Unit-1-Single Random Variable
64 pages
Discrete Random Variable
No ratings yet
Discrete Random Variable
41 pages
RV Intro
No ratings yet
RV Intro
5 pages
Probability and Random Process
No ratings yet
Probability and Random Process
12 pages
LEC4N
No ratings yet
LEC4N
10 pages
LECT3 Probability Theory
No ratings yet
LECT3 Probability Theory
42 pages
Notes ch1 Random Variables and Probability Distributions
No ratings yet
Notes ch1 Random Variables and Probability Distributions
30 pages
DRV
No ratings yet
DRV
12 pages
Random Variables PDF
No ratings yet
Random Variables PDF
64 pages
Module 2 (Updated)
No ratings yet
Module 2 (Updated)
71 pages
Random Variables
No ratings yet
Random Variables
26 pages
CPSC531 Probability
No ratings yet
CPSC531 Probability
75 pages
Random Variables: 1.1 Elementary Examples
No ratings yet
Random Variables: 1.1 Elementary Examples
14 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Stochastic
No ratings yet
Stochastic
63 pages
Random Variables and Process
No ratings yet
Random Variables and Process
31 pages
Chapter 4
80% (5)
Chapter 4
21 pages
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Digital Communication - WWW - Rgpvnotes.in
11 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
E5. Efficient LM Methods
No ratings yet
E5. Efficient LM Methods
41 pages
13. LLM Scaling Laws & Emergent Capacities
No ratings yet
13. LLM Scaling Laws & Emergent Capacities
23 pages
E3. AI Agents
No ratings yet
E3. AI Agents
49 pages
11. Pre-training & LLM 2
No ratings yet
11. Pre-training & LLM 2
46 pages
12. LLM Prompting & In-Context Learning
No ratings yet
12. LLM Prompting & In-Context Learning
18 pages
6. Neural Language Models & Tokenization
No ratings yet
6. Neural Language Models & Tokenization
70 pages
7. Deep Learning Recap
No ratings yet
7. Deep Learning Recap
13 pages
3. Multi-class Classification
No ratings yet
3. Multi-class Classification
52 pages
2. Matrices and Linear Transformations
No ratings yet
2. Matrices and Linear Transformations
74 pages
0. Introduction
No ratings yet
0. Introduction
6 pages
7. Orthogonality
No ratings yet
7. Orthogonality
61 pages
4. Subspace and Basis
No ratings yet
4. Subspace and Basis
60 pages
PVD and CVD Coatings For The Metal Forming Industry
No ratings yet
PVD and CVD Coatings For The Metal Forming Industry
14 pages
0053 Dynamics of Commodity Forward Curves
No ratings yet
0053 Dynamics of Commodity Forward Curves
25 pages
BS 88-2.1 1988 - Low-Voltage Fuses
100% (2)
BS 88-2.1 1988 - Low-Voltage Fuses
12 pages
9701 w13 QP 22
No ratings yet
9701 w13 QP 22
12 pages
Class 12 Physics Worksheet-1
No ratings yet
Class 12 Physics Worksheet-1
3 pages
Biomedical Imaging Techniques
No ratings yet
Biomedical Imaging Techniques
20 pages
Basic Design Criteria For High Rise Buildings
No ratings yet
Basic Design Criteria For High Rise Buildings
15 pages
1.6 The Science of Insulation
No ratings yet
1.6 The Science of Insulation
4 pages
Neuro Thermodynamics
No ratings yet
Neuro Thermodynamics
32 pages
MLU For Windows 2.25 Tutorial
No ratings yet
MLU For Windows 2.25 Tutorial
66 pages
1985 - Lecture Notes in Physics - An Index and Other Useful Information
No ratings yet
1985 - Lecture Notes in Physics - An Index and Other Useful Information
56 pages
Notes - Lecture-1-Fluid Properties
100% (2)
Notes - Lecture-1-Fluid Properties
43 pages
7-4_ Additional Practice (PDF) 2
No ratings yet
7-4_ Additional Practice (PDF) 2
2 pages
Monitoring of Metallurgical Processes During Hot Rolling
No ratings yet
Monitoring of Metallurgical Processes During Hot Rolling
25 pages
Theory and Computation in Hydrodynamic Stability 2nd Edition W. O. Criminale - Download the ebook now and read anytime, anywhere
100% (1)
Theory and Computation in Hydrodynamic Stability 2nd Edition W. O. Criminale - Download the ebook now and read anytime, anywhere
64 pages
Material PTFE PT009807: Technical Data Sheet in Accordance With ASTM
No ratings yet
Material PTFE PT009807: Technical Data Sheet in Accordance With ASTM
2 pages
Curved Beams 1
No ratings yet
Curved Beams 1
19 pages
Popper Lab
No ratings yet
Popper Lab
4 pages
Pi of The Circle Vol 6
No ratings yet
Pi of The Circle Vol 6
254 pages
Applications of CNT in Automobiles
No ratings yet
Applications of CNT in Automobiles
20 pages
L-01_ECE-4121
No ratings yet
L-01_ECE-4121
10 pages
State Space Wiki
No ratings yet
State Space Wiki
13 pages
GMP 10 Good Measurement Practice For Understanding Factors Affecting Weighing Operations
No ratings yet
GMP 10 Good Measurement Practice For Understanding Factors Affecting Weighing Operations
6 pages
Graft Copolymerization, Characterization, and Degradation of Cassava Starch-G-Acrylamide/itaconic Acid Superabsorbents
No ratings yet
Graft Copolymerization, Characterization, and Degradation of Cassava Starch-G-Acrylamide/itaconic Acid Superabsorbents
17 pages
2G BTS Alarms
No ratings yet
2G BTS Alarms
84 pages
Stem Construction of Ship - Ship Construction
No ratings yet
Stem Construction of Ship - Ship Construction
2 pages

0.1. Probability Review

Uploaded by

0.1. Probability Review

Uploaded by

Review of Probability Theory

January 12, 2025

P(A|B) = P(A ∩ B)/P(B).

P(X = k) := P({ω : X(ω) = k}).

2.1 Cumulative distribution functions

Definition 3. A cumulative distribution function (CDF) is a function FX : R 7→ [0, 1] which specifies

Furthermore, if x ≤ y we have FX (x) ≤ FX (y).

2.2 Probability mass functions

PX (x) := P(X = x).

2.3 Probability density functions

Example 5. Suppose that g(x) = I[x ∈ A] for some subset A ⊆ Ω. Then

3 Two Random Variables

3.1 Joint and marginal distributions

3.2 Joint and marginal probability mass functions

3.3 Joint and marginal probability density functions

Analogous to the discrete case, we define

as the marginal probability density function of X.

3.4 Conditional distributions

pXY (x, y) = pX (x)pY (y) ⇐⇒ pY |X (y|x) = pY (y) ∀x, y.

For continuous random variables, this is equivalent to saying that

fXY (x, y) = fX (x)fY (y) ⇐⇒ fY |X (y|x) = fY (y) ∀x, y.

P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B).

3.6 Expectations and covariance

For continuous random variables X, Y , the analogous expression is

The covariance of two random variables X and Y is defined by

Cov[X, Y ] := E[(X − E[X])(Y − E[Y ])].

Using an argument similar to that for variance, we can rewrite this as

Cov[X, Y ] : = E[(X − E[X])(Y − E[Y ])] = E[XY − XE[Y ] − Y E[X] + E[X]E[Y ]]

 Var[X + Y ] = Var[X] + Var[Y ] + 2Cov[X, Y ]

 If X and Y are independent, then

Cov(X, Y ) = 0, E[f (X)g(Y )] = E[f (X)]E[g(Y )]

You might also like

Var[X + Y ] = Var[X] + Var[Y ] + 2Cov[X, Y ]

If X and Y are independent, then