0% found this document useful (0 votes)
9 views

week_4_1

Uploaded by

build852
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

week_4_1

Uploaded by

build852
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

A brief introduction to integral

Integral

• Let’s review the concept of integral from calculus. We will not go


to complicated calculations but just the intuition.
• We know that a function f (x) can be drawn as a line, represents
value of the function at di↵erent x.
• Sometimes we are interested in the area under the curve.
y

f(x)

a b x

• Looks not easy to calculate, right? The line is

1
Integral

If f (x) is flat.. The area is just a rectangular.

The area is just height ⇥ width, h(b a). 2


Integral

If f (x) is curve, we can approximate the area by many rectangulars.


Split the region of [a, b] to several equal segments and draw rectangular.

It is more accurate as you have more and more segments.


3
Integral

If f (x) is curve, we can approximate the area by many rectangulars.


Split the region of [a, b] to several equal segments and draw rectangular.

It is more accurate as you have more and more segments.


4
Integral

If f (x) is curve, we can approximate the area by many rectangulars.


Split the region of [a, b] to several equal segments and draw rectangular.

It is more accurate as you have more and more segments.


5
Integral

If f (x) is curve, we can approximate the area by many rectangulars.


Split the region of [a, b] to several equal segments and draw rectangular.

It is more accurate as you have more and more segments.


6
Integral

We use the math notation


Z b
S= f (x)dx
a
to denote integrate function f (x) from x = a to x = b, dx indicates x is
the variable to integrate.

f(x)

a b x

You are not required to calculate it, but just understand the meaning of
the notation. 7
Continuous random variable
One step further: from discrete to continuous

We learned discrete random variables and several distributions in the past


few weeks.

• Discrete random variable has finite, or countable infinite possible


values.
• Countable infinite refers to although infinite possible values, you
can list them using index of 1, 2, 3, · · · . Such as geometric or
Poisson distribution.
• Continuous random variable has uncountable infinite possible
values, basically, the real numbers.
• You cannot find a way to list real numbers using index 1, 2, · · · . For
example, 1, 1.1, 1.11, 1.111, 1.1111, · · · , where should you put 1.2?

8
Continuous random variable

A continuous random variable is a random variable that

• Can take on any uncountably infinite range of values, say real


numbers.

l
• For any specific value X = x, P(X = x) = 0.

• Rather, people measures probability that X is in certain range


P(X 2 [a, b]).

Examples

• The time at which a bus arrives.

• The sales in dollars of a department store.

• The height of a randomly selected individual.

9
Continuous vs discrete random variable

• The fundamental di↵erence between continuous and discrete random


variables is that continuous R.V. has values in real number. You
cannot list all values with an index of 1, 2, 3, · · · .

• Although Poisson R.V. has finite number of possible values, they are
countable.

• Let’s look at an example, what would happen for a binomial


distribution with increasing n, number of trials.

10
Intuition: probability is the area under the curve

Let’s begin with X ⇠ Binom(5, 0.5). The probability mass function.

PMF, n = 5

0.30
0.25
0.20
Probability

0.15
0.10
0.05

0 1 2 3 4 5

Possible value

P(X < 2) =? It’s sum of two bars.

11
Intuition: probability is the area under the curve

Let’s begin with X ⇠ Binom(20, 0.5). The probability mass function.

PMF, n = 20

0.15
Probability

0.10
0.05
0.00

0 5 10 15 20

Possible value

P(X < 10) =? Sum of 10 bars

12
Intuition: probability is the area under the curve

Let’s begin with X ⇠ Binom(100, 0.5). The probability mass function.

PMF, n = 100

0.08
0.06
Probability

0.04
0.02
0.00

0 20 40 60 80 100

Possible value

P(X < 50) =? 50 bars .. It looks more and more like area under a curve,
from (0, 50), right? Recall how do people approxiate area by rectangulars!

13
Intuition: probability is the area under the curve

Let’s begin with X ⇠ Binom(1000, 0.5). The probability mass function.

PMF, n = 1000

0.000 0.005 0.010 0.015 0.020 0.025


Probability

0 200 400 600 800 1000

Possible value

P(X < 500) =? As n increases, number of possible values increases.


Each bar is thinner, the probablity look like area under the curve, from
(0, 500).

14
Probability density function

• For a continuous random variable, we cannot construct a PMF –


each specific value has zero probability, and we cannot enumerate all
of them.

• Instead, we use a continuous, non-negative function fX (x) called


probability density function, or PDF of X .

• PDF is the equivalence of PMF for continuous random variables.

15
Probability density function

• More generally, for any subset B of the real line


Z
P(X 2 B) = fX (x)dx
B

• You are not required to calculate the integral, but keep in mind, the
area under the PDF represents probability.
16
PMF vs. PDF

Loosely speaking, as you have more and more and more and more
potential outcomes, PMF becomes a PDF.

17
Properties of PDF

• Note that fX (a) is not P(X = a)!. For continuous variables


P(X = a) = 0 all the time.
Ra
• P(X = a) = a f (x)dx = 0.
• This means that

P(X  a) = P(X < a) + P(X = a) = P(X < a)

• fX (x) is non-negative. fX (x) 0.


R
• For any B, P(X 2 B) = x2B fX (x)dx 0.
• Total area under the curve is 1
Z 1
fX (x)dx = 1
1

18
Intuition

• We can think of the probability of X in a small interval with length


, [x, x + ].
R x+
• P(X 2 [x, x + ]) = x fX (t)dt ⇡ fX (x) ⇥

• Let ! 0, P(X 2 [x, x + ]) ! P(X = x)


• However, on the other hand, fX (x) ⇥ ! 0
• Again, fX (x) is not probability P(X = x)!
2 19
Example: continuous uniform random variable

• I know a bus is going to arrive some time in the next hour, but I
don’t know when. If I assume all times within that hour are equally
likely, what will my PDF look like?
• The PDF is

(
1 if 0  x  1
fX (x) =
0 otherwise

20
Example: continuous uniform random variable

(
1 if 0  x  1
fX (x) =
0 otherwise

• What is P(X > 0.5)? 50


• What is P(X > 1.5)? 0
• What is P(X = 0.3)? 0

21
Example: continuous uniform random variable

(
1 if 0  x  1
fX (x) =
0 otherwise

• What is P(X > 0.5)? 0.5


• What is P(X > 1.5)? 0
• What is P(X = 0.3)? 0

22
Continuous uniform random variable

More generally, X is a continuous uniform random variable if the PDF


is (
c if a  x  b
fX (x) =
0 otherwise

What is c?

23
Continuous uniform random variable

More generally, X is a continuous uniform random variable if the PDF


is (
c if a  x  b
fX (x) =
0 otherwise

What is c?
• Remember, the area under the curve should be 1
• Here, the area is (b a) ⇥ c
• So c must be b 1 a
24
Continuous uniform random variable

More generally, X is a continuous uniform random variable if the PDF


is (
1
b a if a  x  b
fX (x) =
0 otherwise

25
Cumulative distribution function

• Often we are interested in P(X  x).


• For example
• What is the probability that the bus arrives before 1:30?
• What is the probability that a randomly selected person is under
175cm?
• What is the probability that this month’s rainfall is less than 3 inch?
• We can get it from probability density function
(P
tx pX (t) if X is discrete
FX (x) = P(X  x) = R x
f (t)dt if X is continuous
1 X

• This is cumulative distribution function (CDF) of X .


• If we know P(X  x), we know P(X > x) = 1 P(X  x).

26
Cumulative distribution function

• If X is discrete, FX (x) is a step function.


P
• FX (x) = tx pX (t).

27
Cumulative distribution function

• If X is continuous, FX (x) is continuous, monotonically


non-decreasing.
• If x  y , we must have FX (x)  FX (y ).
• FX (x) ! 0 as x ! 1. FX (x) ! 1 as x ! 1.
Rx
• FX (x) = 1 fX (t)dt. It is area under the probability density
function.
28
Expectation of a continuous random variable

We saw expectation for discrete random variables.

• For discrete X , X
E [X ] = xpX (x)
x
.
• We can also think of expectation of a continuous random variable.
The number we would expect to get, on average, if we repeated our
experiment infinitely many times.
• What do you think the expectation of a continuous random variable
is?
R1
• E [X ] = 1 xfX (x)dx.
• Similar to the discrete case, we just replace summation by integral.

29
Expectation of functions of a continuous r.v.

• What do you think the expectation of a function g (X ) of a


continuous random variable is?
• Similar to discrete
Z 1
E [g (X )] = g (x)fX (x)dx
1

• So we can calculate mean, second moment, thus variance

Var[X ] = E [X 2 ] (E [X ])2

• Also,
E [aX + b] = aE [X ] + b
Var[aX + b] = a2 Var[X ]
are always true.

30
The standard normal distribution

• The normal distribution is the classic “bell-shaped curve”. We refer


to Z ⇠ N(0, 1) as the standard normal distribution.
• People like to use Z referring standard normal distribution.

• Further, it has a number of nice properties, such as symmetry. For


standard normal P(Z  2) = P(Z 2).

31
The standard normal distribution

• If Z ⇠ N(0, 1),
P( 1 < Z < 1) = 0.68
P( 1.96 < Z < 1.96) = 0.95
• Remember this!

• For simplicity, we write P( 2 < Z < 2) ⇡ 0.95.


32
The standard normal distribution

• What’s P(Z < 2)?


• What’s P(Z  2)?
• What’s P(Z < 0)?

Use symmetric property of normal distribution to simplify calculation!


You do not need to calculate the integral directly!
33
The normal distribution

• The standard normal is not that useful by itself. When we say the
normal distribution, we usually refer to a family of distributions.
• We obtain PDFs of this normal family by shifting the normal curve
around and spreading it out (or tightening it up).

34
The normal distribution

• This is the most popular and important continuous random variable.


Normal, or Gaussian random variable has PDF
1 (x µ)2
fX (x) = p e 2 2
2⇡
where parameters µ is the mean, is standard derivation, > 0.
• We write it as X ⇠ N(µ, 2
).
• E [X ] = µ, Var[X ] = 2
.

35
Linear transformation of normal distribution

• Let X ⇠ N(µ, 2
).
• Let Y = aX + b.
• So E [Y ] = aE [X ] + b = aµ + b.
• Var[Y ] = a2 Var[X ] = a2 2
.
• In fact, if Y = aX + b, Y is also a normal random variable.

Y ⇠ N(aµ + b, a2 2
)

• This is a special property of normal distribution: linear


transformation of a normal random variable is still normal.
• Other random variable may cannot keep its distribution, although
you can calculate the mean and variance of Y .

36
The normal distribution

• µ is mean, it determines where the curve is. The center is at µ.


• is standard deviation, it determines how wide the curve spreads
around µ. X ⇠ N(µ, 2 )
• The area under the curve in the interval (µ 2 , µ + 2 ) is 95%.
P(µ 2 < X < µ + 2 ) ⇡ 0.95
• The area under the curve in the interval (µ , µ + ) is 68%.
P(µ < X < µ + ) ⇡ 0.68

37
The normal distribution

• Example: Below are PDFs of X1 ⇠ N(0, 1), X2 ⇠ N(3, 1),


X3 ⇠ N(0, 16)
• Which curve represents which variable?

38
Example: normal distribution

• Assume the annual returns of S&P500 index are normally


distributed with mean 6% and standard deviation 15%
• We are interested in two questions
• What’s the probability of losing money next year?

P(SP500 < 0) =?

• What’s the value, that there is only 2% chance of losing that


amount?
P(SP500 <?) = 0.02

39
Example: normal distribution

pnorm(0, mean = 6, sd = 15)


[1] 0.3445783
qnorm(0.02, mean = 6, sd = 15)
[1] -24.80623
• (1) P(SP500 < 0) = 0.35, (2) P(SP500 < 25) = 0.02.
• In Excel: NORM.DIST and NORM.INV to calculate.
40
Standardization

• It is hard to compute probabilities of normal distribution in general.


It varies across di↵erent µ and 2 .
• But we can transfer an arbitrary normal variable to the standard
normal variable to simplify calculation.
• We call it standardization, make it standard.
• Let X ⇠ N(µ, 2
).
• We know
X µ
Z= ⇠ N(0, 1)

• Why? Recall that a linear transformation of normal is still normal,


and you can calculate mean and variance of after the transformation.
x µ
• If you plug in a specific value X = x, we call z = a z-score.

42
Standardization

• Let X ⇠ N(µ, 2
).
• We know
X µ
Z= ⇠ N(0, 1)

X µ E [X ] µ µ µ
E [Z ] = E = = =0

X µ 1 1 2
Var[Z ] = Var = 2
Var[X ] = 2
=1

• Therefore Z ⇠ N(0, 1).

43
Another example

• Normal is a family of random variables, makes it hard to compare.


• Let’s say
X ⇠ N(0, 22 ), Y ⇠ N( 2, 52 )
Which of the below is larger?
P(X < 1), P(Y < 1)
• Standardize both variables, let
X Y +2
Z1 = , Z2 =
2 5
• The corresponding z-scores are
1 1+2 1
z1 = , z2 = =
2 5 5
• We know z1 , z2 ⇠ N(0, 1), P(Z < 0.5) and P(Z < 0.2), which one
is larger?
• P(Z < 0.5) is larger, think about the normal curve.
44
How to use standardization

Prior the 1987 crash of the stock market, monthly S&P500 returns r
followed approximately a normal with mean 0.012 and standard deviation
0.043.
The S&P500 dropped -0.2176 in a month during the crash, how
extreme is it?
• r ⇠ N(0.012, 0.0432 )
• Hard to tell how extreme is it just from those numbers, let’s
standardize the normal.
• We know Z = r 0.043
0.012
⇠ N(0, 1).
• Where is the crash? Calculate corresponding z-score
0.2176 0.012
z= = 5.27
0.043
• It is 5 standard derivation away from the mean!
• Recall that 95% are within 2 standard derivation, this event is very
extreme.
45
More about standardization

If X ⇠ N(3, 4), what is P(X < 1)?


X µ X 3
• First we standardize it, Z = = 2 .
1 3
• Next we calculate z-score, z = 2 = 2
• Then we can translate our question to the standard normal
distribution
✓ ◆
X µ 1 µ
P(X < 1) = P < = P(Z < 2)

• Problem, how do you know P(Z < 2)? Recall that


P( 2  Z  2) = 0.95 and the curve is symmetric.
• P(Z < 2) + P( 2  Z  2) + P(Z > 2) = 1, and
P(Z < 2) = P(Z > 2) due to symmetry.
1 0.95
• We have P(Z < 2) = 2 = 0.025

46
The standard normal

• In general, it is hard to calculate P(Z < a) for any a even if Z is a


standard normal distribution.
• People make a standard normal table to have quick reference

• It gives you value about P(Z < a), for example

P(Z < 0.21) = 0.5832, P(Z < 0.33) = 0.6293

The row is first digit, and the column is second digit.

47
Galton board and normal distribution

You can find normal distribution everywhere in the real world!


https://www.youtube.com/watch?v=AwEaHCjgeXk&t=231s

48
Galton board and normal distribution

• Actually at each level, each


ball goes left or right with
equal probability, Bernoulli!
• The results after many levels,
Binomial!
• Normal distribution is a good
approximation to Binomial
when p = 12 .

49
Normal approximates binomial when p = 0.5

• Revisit tossing fair coins, X is number of heads you get out of 100
tosses. X ⇠ Bin(100, 0.5).
• Too tedious to compute P(X  25), you need to calculate
probability for X = 0, 1, · · · , 25.
• But we know E [X ] = np = 50, Var[X ] = np(1 p) = 25.
• Approximate it by normal with the same mean and variance
Y ⇠ N(50, 252 )
• P(X = 25) ⇡ P(Y 2 [24.9, 25.1])
> pnorm(25, mean = 50, sd = 5)
[1] 2.866516e-07
> pbinom(25, 100, 0.5)
[1] 2.818141e-07
You can calculate those numbers in Excel! Have a try!

50

You might also like