0% found this document useful (0 votes)
12 views

Lecture 1

introduction to Bayesian Statistics
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Lecture 1

introduction to Bayesian Statistics
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

LECTURE 1

§1. INTRODUCTION.

This course is an introduction to Bayesian Statistics. Bayesian statistics is a theory in


the field of statistics on the Bayesian interpretation of probability expresses a degree
of belief in an event. The degree of belief may be based on prior knowledge about
the event, such as the results of previous experiments, or on personal beliefs about
the event. Bayesian statistical methods use Bayes theorem to compute and update
probabilities after obtaining new data. This differs from a number of other interpretation
of probability, such as the frequentist interpretation that views probability as the limit
of the relative frequency of an event after many trials. These methods assume that
unknown parameters are fixed constants, and they define probability by using limiting
relative frequencies. It follows from these assumptions that probabilities are objective
and that you cannot make probabilistic statements about parameters because they are
fixed. Bayesian methods offer an alternative approach; they treat parameters as random
variables and define probability as degree of belief (that is, the probability of an event
is the degree to which you believe the event is true). It follows from these postulates
that probabilities are subjective and that you can make probability statements about
parameters. The term Bayesian comes from the prevalent usage of Bayes theorem, which
was named after Thomas Bayes (1702 – 1761).

Bayes theorem describes the conditional probability of an event based on data as well
as prior information or beliefs about the event or conditions related to the event. For
example, in Bayesian inference Bayes theorem can be used to estimate the parameters of
a probability distribution or statistical model. Since Bayesian statistics treats probability
as a degree of belief, Bayes theorem can directly assign a probability distribution that
quantifies the belief to the parameter or set of parameters.

According to the frequentist approach, an unknown population parameter is a fixed,


nonrandom quantity. It is assumed that there is one true population parameter. As a

1
consequence, no probability statements can be made about its value. In the Bayesian
view, in contrast, the true value of a population parameter is conceived as uncertain and
is therefore considered a random variable. According to the Bayesian approach, the un-
known, random population parameter should be described by a probability distribution.
Bayesian philosophy states that θ cannot be determined exactly, and uncertainty about
the parameter is expressed through probability statements and distributions. You can
say that θ follows a normal distribution with mean 0 and variance 1, if it is believed that
this distribution best describes the uncertainty associated with the parameter.

Unlike the frequentist approach, the Bayesian counterpart allows a probability statement
to be made about the value of an unknown parameter. Both approaches also differ in
their notion of probability. Frequentist procedures are based on a concept of probability
that is associated with the idea of long-run frequency (e.g., a coin toss). Frequentist
inference, which employs sampling distributions based on infinite repeated sampling, is
focused on the performance over all possible random samples. Therefore, a frequentist
probability statement does not relate to a particular random sample that was obtained.
Rather, the sampling distribution, which describes the probability distribution of the
sample statistic over all possible random samples from the population, is used to make
a confidence statement about the unknown population parameter. The name confidence
statement is chosen because the inference probability is based on all possible datasets
that could have occurred for the fixed but unknown population parameter.

The Bayesian approach, in contrast, has a different interpretation of probability.


According to this view, a probability statement about an unknown parameter mirrors
a subjective degree of belief or experience of uncertainty. This uncertainty is captured
by a probability distribution that is defined before observing the data. In the Bayesian
terminology, this particular distribution is called the prior distribution, or simply
prior. The idea of a prior is best described as being analogous to placing a bet. The bet
comprises the amount of certainty that a bettor has about a random outcome before
knowing the outcomes realization. The Bayesian approach provides a mathematical rule
called Bayes theorem describing how to change existing prior beliefs about the value of

2
an unknown random parameter in the light of new evidence, such as empirical (sample)
data. The data can be expressed in terms of a likelihood function, sometimes simply
called the likelihood. Using Bayes theorem as a formal rule to weigh the likelihood of
the actual occurred data with the beliefs held before observing the data gives the pos-
terior distribution. The posterior distribution allows researchers to make probability
statements concerning the unknown parameter of interest.

§2. BAYESIAN PRINCIPLE.

Bayes theorem contains three essential elements. It balances a prior state of knowledge
and the data likelihood to a more informed posterior distribution, that is:
Posterior information = prior information + data information

In the classical approach, the parameter θ is assumed to be an unknown, but fixed


quantity. A random sample X1 , X2 , . . . , Xn is drawn from a population with probability
density function f (x, θ) and based on observed values in the sample, knowledge about
the value of θ is obtained.
In Bayesian approach θ is considered to be a quantity whose variation can be de-
scribed by a probability distribution (known as a prior distribution). This is a subjective
distribution, based on the experimenter’s belief, and is formulated before the data are
seen (and hence the name prior distribution). A sample is then taken from a population
where θ is a parameter and the prior distribution is updated with this sample informa-
tion. This updated prior is called the posterior distribution. The updating is done with
the help of Bayes theorem and hence the name Bayesian method.
The frequentist and Bayesian approaches to statistics differ in the definition of Prob-
ability. For a Frequentist, probability is the relative frequency of the occurrence of an
event in a large set of repetitions of the experiment (or in a large ensemble of identical
systems) and is, as such, a property of a so-called random variable. In Bayesian statis-
tics, on the other hand, probability is not defined as a frequency of occurrence but as
the plausibility that a proportion is true, given the available information. Probabilities
are then – in the Bayesian view – not properties of random variables but a quantitative

3
encoding of our state of knowledge about these variables. This view has far-reaching
consequences when it comes to data analysis since Bayesian can assign probabilities to
proportions, or hypotheses, while Frequentists cannot.
The classical methods of estimation that you have studied are based solely on in-
formation provided by the random sample. These methods essentially interpret proba-
bilities as relative frequencies. For example, in arriving at a 95% confidence interval for
mean, we interpret the statement

P (−1.96 < Z < 1.96) = 0.95

the mean that 95 percent of the time in repeated experiments Z (standard normal
random variable) will fall between -1.96 and 1.96. Probabilities of this type that can
be interpreted in the frequency sense will be referred to as objective probabilities. The
Bayesian approach to statistical methods of estimation combines sample information
with other available prior information that may appear to be pertinent. The probabilities
associated with this prior information are called subjective probabilities, in that they
measure a person’s degree of belief in a proportion. The person uses his own experience
and knowledge as the basis for arriving at a subjective probability.

§3. PRIOR AND POSTERIOR DISTRIBUTIONS

Example 1. A disease occurs with prevalence γ in population and θ indicates that an


individual has the disease. Hence

P (θ = 1) = γ, P (θ = 0) = 1 − γ.

A diagnostic test gives a result Y , whose distribution function is F1 (y) for a disease
individual, and F0 (y) otherwise. The most common type of test declares that a person
is diseased if Y > y0 , where y0 is fixed on the basis of past data.

The probability that a person is diseased, given a positive test result, is


γ · [1 − F1 (y0 )]
P (θ = 1/Y > y0 ) = .
γ · [1 − F1 (y0 )] + (1 − γ) · [1 − F0 (y0 )]
This is sometimes called the positive predictive value of test. Its sensitivity and speci-
ficity are 1 − F1 (y0 ) and F0 (y0 ).

4
In more general case, θ can take a finite number of values, labeled 1, 2, ..., k. We can
assign to these values probabilities p1 , p2 , ..., pk which express our beliefs about θ before
we have access to the data. The data y are assumed to be the observed value of a
(multidimensional) random variable Y , and p(y/θ) the density of y given θ (the likelihood
function).

Then the conditional probabilities


pj P (y/θ = j)
P (θ = j/Y = y) = k
, j = 1.2, ..., k,
P
pi P (y/θ = i)
i=1

summarize our beliefs about θ after we have observed Y .

The unconditional probabilities p1 , p2 , ..., pk are called prior probabilities and

P (θ = 1/Y = y) , P (θ = 2/Y = y) , ..., P (θ = k/Y = y)

are called posterior probabilities of θ.

In Bayesian statistical inference, a prior probability distribution, often simply called the
prior, of an uncertain quantity is the probability distribution that would express one’s
beliefs about this quantity before some evidence is taken into account. For example, the
prior could be the probability distribution representing the relative proportions of voters
who will vote for a particular politician in a future election. The unknown quantity may
be a parameter of the model or a latent variable rather than an observable variable.
Prior probability, in Bayesian statistics, is the probability of an event before new data
is collected. This is the best rational assessment of the probability of an outcome based
on the current knowledge before an experiment is performed.

When θ can get values continuously on some interval, we can express our beliefs about
it with a prior density p(θ). After we have obtained the data y, our beliefs about θ are
contained in the conditional density,
p(θ) · f (y/θ)
p(θ/y) = R , (1)
p(θ) · f (y/θ)dθ

called posterior density.

5
Since θ is integrated out in the denominator, it can be considered as a constant with
respect to θ. Therefore, the Bayes’ formula in (1) is often written as

p(θ/y) ∝ p(θ) · f (y/θ), (2)

which denotes that p(θ/y) is proportional to p(θ) · f (y/θ).

§4. The proportionality formula.


Bayesian inference requires determination of the posterior probability distribution
of θ. This task is equivalent to finding the posterior density function of θ, which may be
done using the equation
p(θ) f (y/θ)
p(θ/y) = .
f (y)
Here, f (y) is the unconditional (or prior) marginal density function of y, as given by
R
f (y/θ) p(θ) dθ if is continuous
f (y) = P (3)
θ f (y/θ) p(θ) if is discrete.
Observe that f (y) is a constant with respect to θ in the Bayesian equation (3) which
means that we may also write the equation as

p(θ|y) = c p(θ) f (y|θ),

where
1
c= .
f (y)
We may also write
p(θ|y) ∝ p(θ) f (y|θ),

where ∝ is the proportionality sign.

Another way to express the last equation is

p(θ|y) ∝ p(θ) × L(y1 , y2 , ..., yn |θ),

where L(y1 , y2 , ..., yn |θ) is the likelihood function (defined as the model density
f (y1 , y2 , ..., yn |θ) multiplied by any constant with respect to θ, and viewed as a function of
θ rather than of (y1 , y2 , ..., yn )).

6
The last equation may also be stated in words as:
The posterior is proportional to the prior times the likelihood.
These observations indicate a shortcut method for determining the required poste-
rior distribution which obviates the need for calculating f (y1 , y2 , ..., yn ) (which may be
difficult).

This method is to multiply the prior density (or the kernel of that density) by the
likelihood function and try to identify the resulting function of θ as the density of a
well-known or common distribution. Once the posterior distribution has been identified,
f (y1 , y2 , ..., yn ) may then be obtained easily as the associated normalising constant.

§5. DISTRIBUTION FUNCTIONS

Definition 1. Let (Ω, F, P ) be a probability space, i. e. Ω is a sample space on which


a probability P has been defined. A random variable is a function η from Ω to the set of
real numbers
η: Ω 7−→ IR1 ,

i. e. for every outcome ω ∈ Ω there is a real number, denoted by η(ω), which is called the
value of η(·) at ω.

We can also give the following definition of distribution function.


The distribution function F of a random variable η(ω) is defined for all real numbers
x ∈ IR1 , by the formula
F (x) = P (ω : η(ω) ≤ x). (4)

In words, F (x) denotes the probability that the random variable η(ω) takes on a value
that is less than or equal to x.
Some properties of the distribution function are the following:

Property 1. F is a nondecreasing function, that is, if x1 ≤ x2 then F (x1 ) ≤ F (x2 ).

Property 2. F (x) → 1 as x → +∞.

Property 3. F (x) → 0 as x → −∞.

7
Property 4. F (x) is right continuous. That is, for any x and any decreasing sequence
xn that converges to x,
lim F (xn ) = F (x).
n→∞

Thus, Properties 1 – 4 are necessary conditions for a function G(x) to be a distribution


function.
However, these properties are also sufficient. This assertion follows from the following
theorem which we cite without proof.

Theorem 1 (about Distribution Function). Let a function G(x), x ∈ IR1 satisfy the
Properties 1 — 4. Then there exist a probability space (Ω, P ) and a random variable η(ω) for
which distribution function coincides with given function G(x), i. e.

P (ω : η(ω) ≤ x) = G(x).

Therefore, for giving an example of a random variable we have to cite a function which
satisfies the Properties 1 — 4.
We want to stress that in Theorem about distribution function a random variable η(ω)
is determined by non–unique way (see Appendix-2).

Definition 2. Two random variables η1 (ω) and η2 (ω) are said to be Identically Distributed
if their distribution functions are equal, that is,

Fη1 (x) = Fη2 (x) for all x ∈ IR1 .

Example 2. A random variable η(ω) is said to be Normally distributed if its distribution


function has the following form
x
(y − a)2
Z  
1
F (x) = √ exp − dy, (5)
σ 2π −∞ 2σ 2

where a and σ are constants, moreover a ∈ IR1 and σ > 0.


In order to show the correctness of Example 2 we have to verify that the function in the
right–hand side of (5) satisfies the Properties 1 — 4. The correctness of Example 2 can
be found in the Appendix-1 of this lecture.

8
The normal distribution plays a central role in probability and statistics. This distribu-
tion is also called the Gaussian distribution after Carl Friedrich Gauss, who proposed it
as a model for measurement errors.

Example 3. The Poisson Random Variable. A discrete random variable η(ω),


taking on one of the values 0, 1, 2, ... is said to be a Poisson random variable with parameter
λ if for some λ > 0,

λn −λ
p(n) = P {ω : η(ω) = n} = e , i = 0, 1, 2, ...
n!

Example 4. A random variable is said to be Uniformly distributed on the interval (a, b)


if its distribution function is given by
if x ≤ a

 0
x−a

F (x) = if a ≤ x ≤ b . (6)
 b−a

1 if x ≥ b

It is obvious that the function (6) satisfies all Properties 1 — 4.

Example 5. A random variable is said to be Exponentially distributed with parameter


λ>0 if its distribution function is given by

if x ≤ 0

0
F (x) = −λx
. (7)
1−e if x ≥ 0

It is obvious that the function (7) satisfies all Properties 1 — 4.


Like the Poisson distribution, the exponential distribution depends on the only param-
eter.

Example 6. If η(ω) ≡ c then corresponding distribution function has the form

if x < c

0
F (x) = .
1 if x ≥ c

Consider the experiment of flipping a symmetrical coin once. The two possible outcomes
are “heads” (outcome ω1 ) and “tails” (outcome ω2 ), that is, Ω = {ω1 , ω2 }. Suppose η(ω) is
defined by putting η(ω1 ) = 1 and η(ω2 ) = −1. We may think of it as earning of the player

9
who receives or loses a dollar according as the outcome is heads or tails. Corresponding
distribution function has the form
if x < −1 0

if − 1 ≤ x < 1 .
F (x) = 1/2

if x ≥ 1 
1
§6. CONTINUOUS RANDOM VARIABLES
We say that η(ω) is an absolutely continuous random variable if there exists a function f (x)
defined for all real numbers and the distribution function F (x) of the random variable
η(ω) is represented in the form
Z x
F (x) = f (y) dy. (8)
−∞

The function f is called the Density function of η(ω).

A function f (x) must have certain properties in order to be a density function. Since
F (x) → 1 as x → +∞ we obtain

Property 1.
Z +∞
f (x) dx = 1. (9)
−∞

Property 2. f (x) is a nonnegative function.

Proof: Differentiating both sides of (8) yields


d
f (x) = F (x). (10)
dx
That is, the density is the derivative of the distribution function. We know that the
first derivative of a nondecreasing function is always nonnegative. Therefore the proof
is complete as F (x) is nondecreasing.

Remarkably that these two properties are also sufficient for a function g(x) be a density
function.

Theorem 2 (About Density Function). Let a function g(x), x ∈ IR1 satisfy (9) and,
in addition, satisfies the condition
g(x) ≥ 0 for all x ∈ IR1 .
Then there exist a probability space (Ω, P ) and an absolutely continuous random variable η(ω)
for which density function coincides with given function g(x).

10
Therefore, for giving an example of an absolutely continuous random variable we have
to cite a nonnegative function which satisfies (9).
The normally distributed random variable is absolutely continuous and its density func-
tion has the form
(x − a)2
 
1
f (x) = √ exp − , (11)
σ 2π 2σ 2
where a and σ are constant, moreover a ∈ IR1 and σ > 0.
The uniformly distributed random variable over the interval (a, b) (see Example 4) is
absolutely continuous and its density function has the form

0 if x ∈/ (a, b)
f (x) = 1 . (12)
 if a ≤ x ≤ b
b−a

It is obvious that the function (12) satisfies (9).


The exponentially distributed random variable with parameter λ > 0 (see Example 5) is
absolutely continuous and its density function has the form
if x ≤ 0

0
f (x) = −λx
. (13)
λe if x > 0
It is obvious that the function (13) satisfies (9).
We obtain from (4) that for an absolutely continuous random variable
P (ω : a ≤ η(ω) ≤ b) = P (ω : a ≤ η(ω) < b) =
Z b (14)
= P (ω : a < η(ω) ≤ b) = P (ω : a < η(ω) < b) = f (x) dx.
a

A somewhat more intuitive interpretation of the density function may be obtained from
(14). If η(ω) is an absolutely continuous random variable having density function f (x),
then for small dx
P (ω : x ≤ η(ω) ≤ x + dx) = f (x) dx + o(dx).

Lemma 1. Let F (x) be a distribution function of a random variable η(ω). Then for any
real number x we have
P {ω : η(ω) = x} = F (x) − F (x − 0),

where F (x − 0) is the left–hand limit at x.

11
As the distribution function of an absolutely continuous random variable is continuous
at all points thus
P (ω : η(ω) = x) = 0

for any fixed x.


Therefore this equation states that the probability that an absolutely continuous random
variable will assume any fixed value is zero.
In general case (for any distribution function) we have the following formulae:
P (ω : a ≤ η(ω) ≤ b) = F (b) − F (a − 0),

P (ω : a ≤ η(ω) < b) = F (b − 0) − F (a − 0),

P (ω : a < η(ω) ≤ b) = F (b) − F (a),

P (ω : a < η(ω) < b) = F (b − 0) − F (a),


where F (a − 0) is the left-limit of F (x) at point a.

§6.1. The Gamma Distributions.


The Gamma distribution is important because it includes a wide class of specific distribu-
tions, some of which underlie fundamental statistical procedures. In addition to serving
as a utility distribution, the gamma provides probabilities for yet another random vari-
able associated with Poisson processes (the exponential distribution itself is a member
of the gamma distributions).

The Gamma Function.


In order to describe the Gamma distribution in detail, we must first consider a useful
function, Gamma Function:
Z +∞
Γ(α) = xα−1 e−x dx, α > 0.
0

The symbol Γ (Greek uppercase gamma) is reserved for this function. The integration
by parts of Γ(α) yields that
Γ(α + 1) = α Γ(α).

Note, that for any nonnegative integer k we have

Γ(k + 1) = k!,

12
In particular, Γ(1) = 1.
An important class involves values with halves. We have


 
1
Γ = π,
2

and for any positive integer k

(2k − 1)!! √
 
1
Γ k+ = π,
2 2k

where (2k − 1)!! = 1 · 3 · 5 · ... · (2k − 1).


n!! denotes the double factorial, that is, the product of all numbers from n to 1 that have
the same parity as n. For example, 6!! = 6 · 4 · 2 = 48.

In probability theory and statistics, the gamma distribution is a two-parameter family of


continuous probability distributions. The exponential distribution, Erlang distribution,
and chi-square distribution are special cases of the gamma distribution.

In Bayesian statistics Gamma distribution is a conjugate prior distribution for an expo-


nential and a Poisson distributions.

The Density Function of Gamma Random Variable

The following expression gives the density function for a gamma distribution.
if x ≤ 0,

0
f (x) = λα α−1 −λ x
 x e if x > 0.
Γ(α)

The two parameters λ and α may be any positive values (λ > 0 and α > 0).
A special case of this function occurs when α = 1. We have

if x ≤ 0,

0
f (x) =
λ e−λ x if x > 0

which is the density function for the exponential distribution.


The expectation and Variance of gamma distribution have the forms:

α α
Eη = and Var (η) = .
λ λ2

13
When α is natural, say α = n, the gamma distribution with parameters (n, λ) often arises
in practice:
if x ≤ 0,

0
f (x) = λn
 xn−1 e−λ x if x > 0.
(n − 1)!
This distribution is often referred to in the literature as the n-Erlang distribution. Note
that when n = 1, this distribution reduces to the exponential.
The Gamma distribution with λ = 1/2 and α = n/2 (n is natural) is called χ2n (read
“chi-squared”) distribution with n degrees of freedom:
if x ≤ 0,

0
f (x) = 1
xn/2−1 e−x/2 if x > 0.
2n/2 Γ(n/2)

We have
Eχ2n = n and Varχ2n = 2n.

§6.2. The Beta Distribution.

In Bayesian inference, the beta distribution is the conjugate prior probability distribution
for the Bernoulli, binomial, negative binomial and geometric distributions.

A random variable is said to have a Beta distribution if its density is given by


1

 xa−1 (1 − x)b−1 if 0 < x < 1,
f (x) = B(a, b)
otherwise,

0

where
Z 1
B(a, b) = xa−1 (1 − x)b−1 dx.
0

Note that when a = b = 1 the beta density is the uniform over the interval [0, 1]. When
a and b are greater than 1 the density is bell-shaped, but when they are less than 1
1
it is U -shaped. When a = b, the beta density is symmetric about . When b > a, the
2
density is skewed to the left (in the sense that smaller values become more likely), and
it is skewed to the right when a > b. The following relationship exists between beta and
gamma functions:
Z 1
B(a, b) = B(b, a) = xa−1 (1 − x)b−1 dx =
0

14
y
(if we make the change of variable x = )
1+y
+∞
y a−1
Z
Γ(a) Γ(b)
= dy =
0 (1 + y)a+b Γ(a + b)

The expectation and Variance of beta distribution have the forms:

a ab
Eη = and Var (η) = .
a+b (a + b)2 (a + b + 1)

15
APPENDIX-1:
The correctness of Example 2: Indeed, the function (5) as a function of upper bound is continuous.
Therefore the Properties 4 and 3 are satisfied. Since the integrand is positive we also have Property 1.
Thus it is left to prove that
+∞
(y − a)2
 Z 
1
√ exp − dy = 1.
σ 2π −∞ 2σ 2
Let us make a change of variable:
y−a
x= , σ dx = dy.
σ
Therefore
+∞
x2
Z  
1
√ exp − dx = 1.
2π −∞ 2
To prove that F (x) is indeed a distribution function, we need to show that
+∞ √
Z  2
x
A= exp − dx = 2π.
−∞ 2

Therefore

+∞  2   Z +∞  2   Z +∞ Z +∞  2
x + y2
Z 
x y
A2 = exp − dx · exp − dy = exp − dx dy.
−∞ 2 −∞ 2 −∞ −∞ 2

We now evaluate the double integral by means of a change of variables to polar coordinates. That is, let

x = r · cos ϕ,
y = r · sin ϕ.

As the area element in polar coordinates equals r · dr dϕ, therefore

dx dy = r dr dϕ.

Thus

∞ 2π ∞ ∞
r2
Z Z   Z  2  2
2 r r
A = exp − r dr dϕ = 2π r exp − dr = −2π exp − = 2π.
0 0 2 0 2 2 0

Hence A= 2π and the result is proved. Therefore, by Theorem about distribution function there exists

a random variable for which its distribution function has the form (5).

16
APPENDIX-2:
Let us prove that there is a relation between Beta and Gamma functions:
Z 1
Γ(a) Γ(b)
B(a, b) = B(b, a) = xa−1 (1 − x)b−1 dx = .
0 Γ(a + b)

Proof. Directly from the definition of the Gamma function we obtain:


Z +∞ Z +∞
Γ(a) Γ(b) = e−u ua−1 e−v v b−1 du dv.
0 0

Let us make in this integral the following change of variables:

u = t2 v = z2.

We have du = 2t dt and dv = 2z dz . Therefore, we get


Z +∞ Z +∞
2 2
Γ(a) Γ(b) = 4 e−(t +z ) t2(a−1) z 2(b−1) t z dt dz.
0 0

This integral we can rewrite in the following form:


Z +∞ Z +∞
2
+z 2 )
Γ(a) Γ(b) = e−(t |t|2a−1 |z|2b−1 dt dz.
−∞ −∞

Passing to polar coordinates

t = ρ cos ϕ z = ρ sin ϕ dt dz = ρ dρ dϕ

we obtain Z 2π Z +∞
2
Γ(a) Γ(b) = e−ρ ρ2(a+b)−1 | cos ϕ|2a−1 | sin ϕ|2b−1 ρ dρ dϕ.
0 0
2
Now make the change of variable ρ = x, dx = 2ρ dρ, we get
Z 2π Z +∞
1
Γ(a) Γ(b) = | cos ϕ| 2a−1
| sin ϕ| 2b−1
dϕ e−x x(a+b)−1 dx =
2 0 0
!
Z π/2
2a−1 2b−1
= Γ(a + b) 2 (cos ϕ) (sin ϕ) dϕ .
0

Denote
cos2 ϕ = s, ds = −2 cos ϕ sin ϕ dϕ
we get
Z 1 
= Γ(a + b) sa−1 (1 − s)b−1 ds .
0
Finally we get
Γ(a) Γ(b) = B(a, b) Γ(a + b).
which was required to prove.

17

You might also like