100% found this document useful (1 vote)

403 views22 pages

Learning Bayesian Models With R - Sample Chapter

Chapter No. 1 Introducing the Probability Theory Become an expert in Bayesian Machine Learning methods using R and apply them to solve real-world big data problems For more information: http://bit.ly/1MEkZUb

Uploaded by

Packt Publishing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

403 views22 pages

Learning Bayesian Models With R - Sample Chapter

Uploaded by

Packt Publishing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Fr

Bayesian inference provides a unified framework to deal

with all sorts of uncertainties when learning patterns from
data using machine learning models for predicting future
observations. With the recent advances in computation
and the availability of several open source packages in R,
Bayesian modeling has become more feasible to use for
practical applications.
Learning Bayesian Models with R starts by giving you
comprehensive coverage of the Bayesian machine
learning models and the R packages that implement them.
Every chapter begins with a theoretical description of the
method, explained in a very simple manner. Then, relevant
R packages are discussed and some illustrations that use
datasets from the UCI machine learning repository are given.
Each chapter ends with some simple exercises for you to
get hands-on experience of the concepts and R packages
discussed in the chapter.

Who this book is written for

How machine learning models are built

using Bayesian inference techniques
Perform Bayesian inference using the
R programming language

P U B L I S H I N G

science problems.

C o m m u n i t y

Understand Bayesian models for

deep learning
Use of R in Big Data frameworks such
as Hadoop and Spark
Run R programs in cloud computing
environments such as AWS and Azure

$ 44.99 US
28.99 UK

community experience distilled

State-of-the-art R packages for Bayesian

models and how to apply them in data

Dr. Hari M. Koduvely

This book is intended for data scientists who analyze large

datasets to generate insights and for data engineers who
develop platforms, solutions or applications based on
machine learning. Though many data science practitioners
are quite familiar with machine learning techniques and R,
they may not know about Bayesian inference and its merits.
This book therefore would be helpful to even experienced
data scientists and data engineers to learn Bayesian
methods and use them in their projects.

What you will learn from this book

Learning Bayesian Models with R

Learning Bayesian
Models with R

Sa
m

D i s t i l l e d

Learning Bayesian
Models with R
Become an expert in Bayesian machine learning methods using R
and apply them to solve real-world Big Data problems

Prices do not include

local sales tax or VAT
where applicable

Visit www.PacktPub.com for books, eBooks,

code, downloads, and PacktLib.

E x p e r i e n c e

Dr. Hari M. Koduvely

In this package, you will find:

The author biography

A preview chapter from the book, Chapter 1 'Introducing the
Probability Theory'
A synopsis of the books content
More information on Learning Bayesian Models with R

About the Author

Dr. Hari M. Koduvely is an experienced data scientist working at the

Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics

from the Tata Institute of Fundamental Research, Mumbai, India, and post-doctoral
experience from the Weizmann Institute, Israel, and Georgia Tech, USA. Prior to
joining Samsung, the author has worked for Amazon and Infosys Technologies,
developing machine learning-based applications for their products and platforms.
He also has several publications on Bayesian inference and its applications in areas
such as recommendation systems and predictive health monitoring. His current
interest is in developing large-scale machine learning methods, particularly for
natural language understanding.

Preface
Bayesian inference provides a unified framework to deal with all sorts of uncertainties
when learning patterns from data using machine learning models and using it for
predicting future observations. However, learning and implementing Bayesian
models is not easy for data science practitioners due to the level of mathematical
treatment involved. Also, applying Bayesian methods to real-world problems
requires high computational resources. With the recent advancements in cloud and
high-performance computing and easy access to computational resources, Bayesian
modeling has become more feasible to use for practical applications today. Therefore,
it would be advantageous for all data scientists and data engineers to understand
Bayesian methods and apply them in their projects to achieve better results.

What this book covers

This book gives comprehensive coverage of the Bayesian machine learning models
and the R packages that implement them. It begins with an introduction to the
fundamentals of probability theory and R programming for those who are new to
the subject. Then, the book covers some of the most important machine learning
methods, both supervised learning and unsupervised learning, implemented using
Bayesian inference and R. Every chapter begins with a theoretical description of
the method, explained in a very simple manner. Then, relevant R packages are
discussed and some illustrations using datasets from the UCI machine learning
repository are given. Each chapter ends with some simple exercises for you to get
hands-on experience of the concepts and R packages discussed in the chapter. The
state-of-the-art topics covered in the chapters are Bayesian regression using linear
and generalized linear models, Bayesian classification using logistic regression,
classification of text data using Nave Bayes models, and Bayesian mixture models
and topic modeling using Latent Dirichlet allocation.

Preface

The last two chapters are devoted to the latest developments in the field. One
chapter discusses deep learning, which uses a class of neural network models that
are currently at the frontier of artificial intelligence. The book concludes with the
application of Bayesian methods on Big Data using frameworks such as Hadoop
and Spark.
Chapter 1, Introducing the Probability Theory, covers the foundational concepts
of probability theory, particularly those aspects required for learning Bayesian
inference, which are presented to you in a simple and coherent manner.
Chapter 2, The R Environment, introduces you to the R environment. After reading
through this chapter, you will learn how to import data into R, make a selection of
subsets of data for its analysis, and write simple R programs using functions and
control structures. Also, you will get familiar with the graphical capabilities of R
and some advanced capabilities such as loop functions.
Chapter 3, Introducing Bayesian Inference, introduces you to the Bayesian
statistic framework. This chapter includes a description of the Bayesian theorem,
concepts such as prior and posterior probabilities, and different methods to
estimate posterior distribution such as MAP estimates, Monte Carlo simulations,
and variational estimates.
Chapter 4, Machine Learning Using Bayesian Inference, gives an overview of what
machine learning is and what some of its high-level tasks are. This chapter also
discusses the importance of Bayesian inference in machine learning, particularly in
the context of how it can help to avoid important issues such as model overfit and
how to select optimum models.
Chapter 5, Bayesian Regression Models, presents one of the most common supervised
machine learning tasks, namely, regression modeling, in the Bayesian framework.
It shows by using an example how you can get tighter confidence intervals of
prediction using Bayesian regression models.
Chapter 6, Bayesian Classification Models, presents how to use the Bayesian framework
for another common machine learning task, classification. The two Bayesian models
of classification, Nave Bayes and Bayesian logistic regression, are discussed along
with some important metrics for evaluating the performance of classifiers.
Chapter 7, Bayesian Models for Unsupervised Learning, introduces you to the concepts
behind unsupervised and semi-supervised machine learning and their Bayesian
treatment. The two most important Bayesian unsupervised models, the Bayesian
mixture model and LDA, are discussed.

Preface

Chapter 8, Bayesian Neural Networks, presents an important class of machine learning

model, namely neural networks, and their Bayesian implementation. Neural network
models are inspired by the architecture of the human brain and they continue to be
an area of active research and development. The chapter also discusses deep learning,
one of the latest advances in neural networks, which is used to solve many problems in
computer vision and natural language processing with remarkable accuracy.
Chapter 9, Bayesian Modeling at Big Data Scale, covers various frameworks for
performing large-scale Bayesian machine learning such as Hadoop, Spark, and
parallelization frameworks that are native to R. The chapter also discusses how to set
up instances on cloud services, such as Amazon Web Services and Microsoft Azure,
and run R programs on them.

Introducing the Probability

Theory
Bayesian inference is a method of learning about the relationship between variables
from data, in the presence of uncertainty, in real-world problems. It is one of the
frameworks of probability theory. Any reader interested in Bayesian inference
should have a good knowledge of probability theory to understand and use Bayesian
inference. This chapter covers an overview of probability theory, which will be
sufficient to understand the rest of the chapters in this book.
It was Pierre-Simon Laplace who first proposed a formal definition of probability
with mathematical rigor. This definition is called the Classical Definition and it
states the following:
The theory of chance consists in reducing all the events of the same kind to a certain
number of cases equally possible, that is to say, to such as we may be equally
undecided about in regard to their existence, and in determining the number of
cases favorable to the event whose probability is sought. The ratio of this number
to that of all the cases possible is the measure of this probability, which is thus
simply a fraction whose numerator is the number of favorable cases and whose
denominator is the number of all the cases possible.
Pierre-Simon Laplace, A Philosophical Essay on Probabilities
What this definition means is that, if a random experiment can result in N mutually
exclusive and equally likely outcomes, the probability of the event A is given by:

P ( A) =

[1]

NA
N

Introducing the Probability Theory

Here, N A is the number of occurrences of the event A .

To illustrate this concept, let us take a simple example of a rolling dice. If the dice is
a fair dice, then all the faces will have an equal chance of showing up when the dice
is rolled. Then, the probability of each face showing up is 1/6. However, when one
rolls the dice 100 times, all the faces will not come in equal proportions of 1/6 due to
random fluctuations. The estimate of probability of each face is the number of times
the face shows up divided by the number of rolls. As the denominator is very large,
this ratio will be close to 1/6.
In the long run, this classical definition treats the probability of an uncertain event
as the relative frequency of its occurrence. This is also called a frequentist approach
to probability. Although this approach is suitable for a large class of problems, there
are cases where this type of approach cannot be used. As an example, consider the
following question: Is Pataliputra the name of an ancient city or a king? In such cases, we
have a degree of belief in various plausible answers, but it is not based on counts in
the outcome of an experiment (in the Sanskrit language Putra means son, therefore
some people may believe that Pataliputra is the name of an ancient king in India, but
it is a city).
Another example is, What is the chance of the Democratic Party winning the election in
2016 in America? Some people may believe it is 1/2 and some people may believe it
is 2/3. In this case, probability is defined as the degree of belief of a person in the
outcome of an uncertain event. This is called the subjective definition of probability.
One of the limitations of the classical or frequentist definition of probability is that
it cannot address subjective probabilities. As we will see later in this book, Bayesian
inference is a natural framework for treating both frequentist and subjective
interpretations of probability.

Probability distributions
In both classical and Bayesian approaches, a probability distribution function is
the central quantity, which captures all of the information about the relationship
between variables in the presence of uncertainty. A probability distribution assigns
a probability value to each measurable subset of outcomes of a random experiment.
The variable involved could be discrete or continuous, and univariate or multivariate.
Although people use slightly different terminologies, the commonly used probability
distributions for the different types of random variables are as follows:

Probability mass function (pmf) for discrete numerical random variables

Categorical distribution for categorical random variables

Probability density function (pdf) for continuous random variables

[2]

Chapter 1

One of the well-known distribution functions is the normal or Gaussian distribution,

which is named after Carl Friedrich Gauss, a famous German mathematician
and physicist. It is also known by the name bell curve because of its shape. The
mathematical form of this distribution is given by:

N ( x; , ) =

1
2 2

( x )2
2 2

Here, is the mean or location parameter and is the standard deviation or scale
parameter ( 2 is called variance). The following graphs show what the distribution
looks like for different values of location and scale parameters:

One can see that as the mean changes, the location of the peak of the distribution
changes. Similarly, when the standard deviation changes, the width of the
distribution also changes.

[3]

Introducing the Probability Theory

Many natural datasets follow normal distribution because, according to the

central limit theorem, any random variable that can be composed as a mean of
independent random variables will have a normal distribution. This is irrespective
of the form of the distribution of this random variable, as long as they have finite
mean and variance and all are drawn from the same original distribution. A normal
distribution is also very popular among data scientists because in many statistical
inferences, theoretical results can be derived if the underlying distribution is normal.
Now, let us look at the multidimensional version of normal distribution. If the
random variable is an N-dimensional vector, x is denoted by:

x = [ x1 , x2 , , xN ]
Then, the corresponding normal distribution is given by:

N ( x | , ) =

( 2 )

T
1

exp ( x ) 1 ( x )
2

Here, corresponds to the mean (also called location) and is an N x N covariance

matrix (also called scale).
To get a better understanding of the multidimensional normal distribution,
let us take the case of two dimensions. In this case, x = [ x1 , x2 ] and the covariance
matrix is given by:

12
=
2 1

[4]

1 2

Chapter 1
2
2
Here, 1 and 2 are the variances along x1 and x2 directions, and is the
correlation between x1 and x2 . A plot of two-dimensional normal distribution for
12 = 9 , 22 = 4 , and = 0.8 is shown in the following image:

If = 0 , then the two-dimensional normal distribution will be reduced to the

product of two one-dimensional normal distributions, since would become
diagonal in this case. The following 2D projections of normal distribution for the
2
same values of 1 and 22 but with = 0.8 and = 0.0 illustrate this case:

[5]

Introducing the Probability Theory

The high correlation between x and y in the first case forces most of the data points
along the 45 degree line and makes the distribution more anisotropic; whereas, in the
second case, when the correlation is zero, the distribution is more isotropic.
We will briefly review some of the other well-known distributions used in Bayesian
inference here.

Conditional probability
Often, one would be interested in finding the probability of the occurrence of a set
of random variables when other random variables in the problem are held fixed. As
an example of population health study, one would be interested in finding what is
the probability of a person, in the age range 40-50, developing heart disease with
high blood pressure and diabetes. Questions such as these can be modeled using
conditional probability, which is defined as the probability of an event, given that
another event has happened. More formally, if we take the variables A and B, this
definition can be rewritten as follows:

P ( A | B) =

P ( A, B )
P ( B)

P ( B | A) =

P ( A, B )
P ( A)

Similarly:

The following Venn diagram explains the concept more clearly:

[6]

Chapter 1

In Bayesian inference, we are interested in conditional probabilities corresponding

to multivariate distributions. If [ x1 , x2 , , xN , z1 , z2 , , zM ] denotes the entire random
variable set, then the conditional probability of [ x1 , x2 , , xN ] , given that [ z1 , z2 , , zM ] is
fixed at some value, is given by the ratio of joint probability of [ x1 , x2 , , xN , z1 , z2 , , zM ]
and joint probability of [ z1 , z2 , , zM ] :

P ( x1 , x2 , , xN , z1 , z2 , , zM )
P ( z1 , z2 , , zM )

P ( x1 , x2 , , xN | z1 , z2 , , zM ) =

In the case of two-dimensional normal distribution, the conditional probability of

interest is as follows:

P ( x1 | x2 ) =

N ( x1 , x2 )
N ( x2 )

It can be shown that (exercise 2 in the Exercises section of this chapter) the RHS can be
simplified, resulting in an expression for P ( x1 | x2 ) in the form of a normal distribution
again with the mean = 1 + 1 ( x2 2 ) and variance = (1 u 2 ) 12 .
2

Bayesian theorem
From the definition of the conditional probabilities P ( A | B ) and P ( B | A) , it is easy to
show the following:

P ( A | B) =

P ( B | A) P ( A)
P ( B)

Rev. Thomas Bayes (17011761) used this rule and formulated his famous Bayes
theorem that can be interpreted if P ( A ) represents the initial degree of belief (or prior
probability) in the value of a random variable A before observing B; then, its posterior
probability or degree of belief after accounted for B will get updated according to the
preceding equation. So, the Bayesian inference essentially corresponds to updating
beliefs about an uncertain system after having made some observations about it. In the
sense, this is also how we human beings learn about the world. For example, before
we visit a new city, we will have certain prior knowledge about the place after reading
from books or on the Web.

[7]

Introducing the Probability Theory

However, soon after we reach the place, this belief will get updated based on our initial
experience of the place. We continuously update the belief as we explore the new
city more and more. We will describe Bayesian inference more in detail in Chapter 3,
Introducing Bayesian Inference.

Marginal distribution
In many situations, we are interested only in the probability distribution of a subset
of random variables. For example, in the heart disease problem mentioned in the
previous section, if we want to infer the probability of people in a population
having a heart disease as a function of their age only, we need to integrate out the
effect of other random variables such as blood pressure and diabetes. This is called
marginalization:

P ( x1 , x2 , xM ) = P ( x1 , x2 , xM , xM +1 , xN ) dxM +1 dxN
Or:

P ( x1 , x2 , xM ) = P ( x1 , x2 , xM | xM +1 , xN ) P ( xM +1 , xN ) dxM +1 dxN
Note that marginal distribution is very different from conditional distribution.
In conditional probability, we are finding the probability of a subset of random
variables with values of other random variables fixed (conditioned) at a given
value. In the case of marginal distribution, we are eliminating the effect of a subset
of random variables by integrating them out (in the sense averaging their effect)
from the joint distribution. For example, in the case of two-dimensional normal
distribution, marginalization with respect to one variable will result in a onedimensional normal distribution of the other variable, as follows:

N ( x1 )

N (x

x2 ) dx2

The details of this integration is given as an exercise (exercise 3 in the Exercises

section of this chapter).

[8]

Chapter 1

Expectations and covariance

Having known the distribution of a set of random variables x = { x1 , x2 , , xN } ,

what one would be typically interested in for real-life applications is to be able to
estimate the average values of these random variables and the correlations between
them. These are computed formally using the following expressions:

E [ xi ] = xi P ( x1 , x2 , , xi , xN ) dx1 dxN

( xi , x j ) = E ( xi E [ xi ]) x j E x j

For example, in the case of two-dimensional normal distribution, if we are interested

in finding the correlation between the variables x1 and x2 , it can be formally
computed from the joint distribution using the following formula:

= x1 x2 N ( x1 , x2 ) dx1dx2 x1 N ( x1 ) dx1 x2 N ( x2 ) dx2

Binomial distribution
A binomial distribution is a discrete distribution that gives the probability of heads
in n independent trials where each trial has one of two possible outcomes, heads or
tails, with the probability of heads being p. Each of the trials is called a Bernoulli trial.
The functional form of the binomial distribution is given by:

P ( k ; n, p ) =

n!
nk
p k (1 p )
( n k )!k !

[9]

Introducing the Probability Theory

Here, P ( k ; n, p ) denotes the probability of having k heads in n trials. The mean of the
binomial distribution is given by np and variance is given by np(1-p). Have a look at
the following graphs:

The preceding graphs show the binomial distribution for two values of n; 100 and
1000 for p = 0.7. As you can see, when n becomes large, the Binomial distribution
becomes sharply peaked. It can be shown that, in the large n limit, a binomial
distribution can be approximated using a normal distribution with mean np and
variance np(1-p). This is a characteristic shared by many discrete distributions that,
in the large n limit, they can be approximated by some continuous distributions.

Beta distribution
The Beta distribution denoted by Beta ( x | , ) is a function of the power of x , and its
reflection (1 x ) is given by:

Beta ( x | , ) =

1
1
x 1 (1 x )
B ( , )

[ 10 ]

Chapter 1

Here, , > 0 are parameters that determine the shape of the distribution
function and B ( , ) is the Beta function given by the ratio of Gamma functions:
B ( , ) = ( ) ( ) / ( + )
.
The Beta distribution is a very important distribution in Bayesian inference. It is the
conjugate prior probability distribution (which will be defined more precisely in the
next chapter) for binomial, Bernoulli, negative binomial, and geometric distributions.
It is used for modeling the random behavior of percentages and proportions. For
example, the Beta distribution has been used for modeling allele frequencies in
population genetics, time allocation in project management, the proportion of
minerals in rocks, and heterogeneity in the probability of HIV transmission.

Gamma distribution
The Gamma distribution denoted by Gamma ( x | , ) is another common distribution
used in Bayesian inference. It is used for modeling the waiting times such as survival
rates. Special cases of the Gamma distribution are the well-known Exponential and
Chi-Square distributions.
In Bayesian inference, the Gamma distribution is used as a conjugate prior for the
inverse of variance of a one-dimensional normal distribution or parameters such as
the rate ( ) of an exponential or Poisson distribution.
The mathematical form of a Gamma distribution is given by:

1
Gamma ( x | , ) =
x exp ( x )
( )
Here, and are the shape and rate parameters, respectively (both take values
greater than zero). There is also a form in terms of the scale parameter ( = 1/ ) , which
is common in econometrics. Another related distribution is the Inverse-Gamma
distribution that is the distribution of the reciprocal of a variable that is distributed
according to the Gamma distribution. It's mainly used in Bayesian inference as the
conjugate prior distribution for the variance of a one-dimensional normal distribution.

[ 11 ]

Introducing the Probability Theory

Dirichlet distribution
The Dirichlet distribution is a multivariate analogue of the Beta distribution.
It is commonly used in Bayesian inference as the conjugate prior distribution for
multinomial distribution and categorical distribution. The main reason for this is
that it is easy to implement inference techniques, such as Gibbs sampling, on the
Dirichlet-multinomial distribution.
The Dirichlet distribution of order K is defined over an open ( K 1) dimensional
simplex as follows:

Dir ( x | ) =

1 K i 1
xi
B () 1

Here, x1 , x2 , , xK 1 > 0 , x1 + x2 + + xK 1 < 1 , and xK = 1 x1 xK 1 .

Wishart distribution
The Wishart distribution is a multivariate generalization of the Gamma distribution. It
is defined over symmetric non-negative matrix-valued random variables. In Bayesian
inference, it is used as the conjugate prior to estimate the distribution of inverse of
1
the covariance matrix (or precision matrix) of the normal distribution. When we
discussed Gamma distribution, we said it is used as a conjugate distribution for the
inverse of the variance of the one-dimensional normal distribution.
The mathematical definition of the Wishart distribution is as follows:

Wp ( X | V , n ) =
2

np
2

n
V p
2
n
2

n p 1
2

exp tr (V 1 X )
2

Here, X denotes the determinant of the matrix X of dimension p p and n p is

the degrees of freedom.
A special case of the Wishart distribution is when p = V = 1 corresponds to the
well-known Chi-Square distribution function with n degrees of freedom.
Wikipedia gives a list of more than 100 useful distributions that are commonly used
by statisticians (reference 1 in the Reference section of this chapter). Interested readers
should refer to this article.
[ 12 ]

Chapter 1

Exercises
1. By using the definition of conditional probability, show that any multivariate
joint distribution of N random variables [ x1 , x2 , , xN ] has the following trivial
factorization:

P ( x1 , x2 , , xN ) = P ( x1 | x2 , , xN ) P ( x2 | x3 , , xN ) P ( xN 1 | xN ) P ( xN )
2. The bivariate normal distribution is given by:

N ( x | , ) =

( 2 )

T
1

exp ( x ) 1 ( x )
2

Here:

1
,
2

2
= 1
2 1

1 2

By using the definition of conditional probability, show that the conditional

distribution P ( x1 | x2 ) can be written as a normal distribution of the form
N ( x1 | , )

where

= 1 +

1
( x 2 )
2 2

2
2
and = (1 u ) 1 .

3. By using explicit integration of the expression in exercise 2, show that the

marginalization of bivariate normal distribution will result in univariate
normal distribution.
4. In the following table, a dataset containing the measurements of petal and
sepal sizes of 15 different Iris flowers are shown (taken from the Iris dataset,
UCI machine learning dataset repository). All units are in cms:
Sepal
Length

Sepal
Width

Petal
Length

Petal
Width

Class of
Flower

5.1

3.5

1.4

0.2

Iris-setosa

4.9

1.4

0.2

Iris-setosa

4.7

3.2

1.3

0.2

Iris-setosa

4.6

3.1

1.5

0.2

Iris-setosa

3.6

1.4

0.2

Iris-setosa

3.2

4.7

1.4

Iris-versicolor

[ 13 ]

Introducing the Probability Theory

Sepal
Length

Sepal
Width

Petal
Length

Petal
Width

Class of
Flower

6.4

3.2

4.5

1.5

Iris-versicolor

6.9

3.1

4.9

1.5

Iris-versicolor

5.5

2.3

1.3

Iris-versicolor

6.5

2.8

4.6

1.5

Iris-versicolor

6.3

3.3

2.5

Iris-virginica

5.8

2.7

5.1

1.9

Iris-virginica

7.1

5.9

2.1

Iris-virginica

6.3

2.9

5.6

1.8

Iris-virginica

6.5

5.8

2.2

Iris-virginica

Answer the following questions:

1. What is the probability of finding flowers with a sepal length more
than 5 cm and a sepal width less than 3 cm?
2. What is the probability of finding flowers with a petal length less
than 1.5 cm; given that petal width is equal to 0.2 cm?
3. What is the probability of finding flowers with a sepal length less
than 6 cm and a petal width less than 1.5 cm; given that the class of
the flower is Iris-versicolor?

References
1. http://en.wikipedia.org/wiki/List_of_probability_distributions
2. Feller W. An Introduction to Probability Theory and Its Applications.
Vol. 1. Wiley Series in Probability and Mathematical Statistics. 1968.
ISBN-10: 0471257087
3. Jayes E.T. Probability Theory: The Logic of Science. Cambridge University Press.
2003. ISBN-10: 0521592712
4. Radziwill N.M. Statistics (The Easier Way) with R: an informal text on applied
statistics. Lapis Lucera. 2015. ISBN-10: 0692339426

[ 14 ]

Chapter 1

Summary
To summarize this chapter, we discussed elements of probability theory;
particularly those aspects required for learning Bayesian inference. Due to lack of
space, we have not covered many elementary aspects of this subject. There are some
excellent books on this subject, for example, books by William Feller (reference 2
in the References section of this chapter), E. T. Jaynes (reference 3 in the References
section of this chapter), and M. Radziwill (reference 4 in the References section of this
chapter). Readers are encouraged to read these to get a more in-depth understanding
of probability theory and how it can be applied in real-life situations.
In the next chapter, we will introduce the R programming language that is the
most popular open source framework for data analysis and Bayesian inference
in particular.

[ 15 ]

Get more information Learning Bayesian Models with R

Where to buy this book

You can buy Learning Bayesian Models with R from the Packt Publishing website.
Alternatively, you can buy the book from Amazon, BN.com, Computer Manuals and most internet
book retailers.
Click here for ordering and shipping details.

www.PacktPub.com

Stay Connected:

Monte Carlo Simulation - Methods, Assessment and Applications
No ratings yet
Monte Carlo Simulation - Methods, Assessment and Applications
167 pages
JIRA 7 Administration Cookbook Second Edition - Sample Chapter
No ratings yet
JIRA 7 Administration Cookbook Second Edition - Sample Chapter
35 pages
Bayesian Statistics With R and BUGS
100% (1)
Bayesian Statistics With R and BUGS
143 pages
Learning Bayesian Models with R
From Everand
Learning Bayesian Models with R
M.Koduvely Dr. Hari
5/5 (1)
MCMC - Markov Chain Monte Carlo: One of The Top Ten Algorithms of The 20th Century
100% (1)
MCMC - Markov Chain Monte Carlo: One of The Top Ten Algorithms of The 20th Century
31 pages
Introduction To Probability Simulation and Gibbs Sampling With R
No ratings yet
Introduction To Probability Simulation and Gibbs Sampling With R
322 pages
STATS Textbook
100% (1)
STATS Textbook
459 pages
Diggle 2013 Statistical Analysis of Spatial and
No ratings yet
Diggle 2013 Statistical Analysis of Spatial and
69 pages
Bayesian Statistics 01
100% (1)
Bayesian Statistics 01
22 pages
Priors Algorithms Bayesian
No ratings yet
Priors Algorithms Bayesian
108 pages
Statistical Models and Methods
100% (1)
Statistical Models and Methods
561 pages
Bayesian Statistics: A User's Perspective
No ratings yet
Bayesian Statistics: A User's Perspective
24 pages
Sequential Analysis Hypothesis Testing and Changepoint Detection ( Etc.) (Z-Library)
No ratings yet
Sequential Analysis Hypothesis Testing and Changepoint Detection ( Etc.) (Z-Library)
600 pages
Monte Carlo Methods PDF
No ratings yet
Monte Carlo Methods PDF
6 pages
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
No ratings yet
GAM: The Predictive Modeling Silver Bullet: Author: Kim Larsen
27 pages
Monte Carlo Simulation Based Statistical Modeling PDF
No ratings yet
Monte Carlo Simulation Based Statistical Modeling PDF
160 pages
Bayesian Lecture Notes
No ratings yet
Bayesian Lecture Notes
28 pages
Prophet R
No ratings yet
Prophet R
18 pages
Gamlss-Manual Instructions On How To Use The Gamlss Package 2008
No ratings yet
Gamlss-Manual Instructions On How To Use The Gamlss Package 2008
206 pages
Download (Original PDF) Categorical Data Analysis 3rd Edition by Alan Agresti ebook All Chapters PDF
100% (8)
Download (Original PDF) Categorical Data Analysis 3rd Edition by Alan Agresti ebook All Chapters PDF
46 pages
Goodness of Fit Techniques
No ratings yet
Goodness of Fit Techniques
589 pages
Kotz Correlation and Dependence
No ratings yet
Kotz Correlation and Dependence
237 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
172 pages
Computational Bayesian Statistics
100% (1)
Computational Bayesian Statistics
254 pages
Difference Between Logit and Probit Models
100% (1)
Difference Between Logit and Probit Models
7 pages
Statistical Inference For Data Science
No ratings yet
Statistical Inference For Data Science
124 pages
Linear Regression With Multiple Variables - Machine Learning, Deep Learning, and Computer Vision
100% (1)
Linear Regression With Multiple Variables - Machine Learning, Deep Learning, and Computer Vision
12 pages
John O'Quigley - Survival Analysis - Proportional and Non-Proportional Hazards Regression (Springer The Data Sciences) - Springer (2021)
No ratings yet
John O'Quigley - Survival Analysis - Proportional and Non-Proportional Hazards Regression (Springer The Data Sciences) - Springer (2021)
476 pages
Sampling Distribution and Simulation in R
No ratings yet
Sampling Distribution and Simulation in R
10 pages
0412055511MarkovChain
100% (5)
0412055511MarkovChain
508 pages
Statistical Inference in Science
No ratings yet
Statistical Inference in Science
262 pages
Overview Of Bayesian Approach To Statistical Methods: Software
From Everand
Overview Of Bayesian Approach To Statistical Methods: Software
Vinaitheerthan Renganathan
No ratings yet
Probability Distributions
83% (6)
Probability Distributions
208 pages
Online Machine Learning Algorithms For Currency Exchange Prediction
No ratings yet
Online Machine Learning Algorithms For Currency Exchange Prediction
84 pages
On The Use of Expert Judgement in The Qualification of Risk Assessment
No ratings yet
On The Use of Expert Judgement in The Qualification of Risk Assessment
52 pages
Advanced Machine Learning: CS 281
100% (1)
Advanced Machine Learning: CS 281
88 pages
Advances in Bayesian Machine Learning From Uncertainty To Decision Making
No ratings yet
Advances in Bayesian Machine Learning From Uncertainty To Decision Making
272 pages
Adaptive Regression For Modeling Nonlinear Relationships: George J. Knafl Kai Ding
No ratings yet
Adaptive Regression For Modeling Nonlinear Relationships: George J. Knafl Kai Ding
384 pages
PDF Evolutionary Optimization Algorithms Full Online: Book Details
No ratings yet
PDF Evolutionary Optimization Algorithms Full Online: Book Details
1 page
Data Science Analytics For Ordinary People PDF
No ratings yet
Data Science Analytics For Ordinary People PDF
199 pages
Fundamental Probability - 2006 - Paolella
100% (2)
Fundamental Probability - 2006 - Paolella
474 pages
Examples of Bayes Theorem PDF
67% (3)
Examples of Bayes Theorem PDF
2 pages
Comm-05-Random Variables and Processes
No ratings yet
Comm-05-Random Variables and Processes
90 pages
GAMS Getting Started
No ratings yet
GAMS Getting Started
31 pages
Hult H. - Lindskog F. - Mathematical Modeling and Statistical Methods For Risk Management (2007)
No ratings yet
Hult H. - Lindskog F. - Mathematical Modeling and Statistical Methods For Risk Management (2007)
108 pages
Manual Bowers Completo
No ratings yet
Manual Bowers Completo
94 pages
Handbook Statistical Foundations of Machine Learning
No ratings yet
Handbook Statistical Foundations of Machine Learning
267 pages
Applied Econometrics Using Matlab
100% (1)
Applied Econometrics Using Matlab
348 pages
A Course in Mathematical Statistics 0125993153
100% (10)
A Course in Mathematical Statistics 0125993153
593 pages
Numerical Methods for Two-Point Boundary-Value Problems
From Everand
Numerical Methods for Two-Point Boundary-Value Problems
Herbert B. Keller
No ratings yet
Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications
From Everand
Stationary and Related Stochastic Processes: Sample Function Properties and Their Applications
Harald Cramér
4/5 (2)
Learning Bayesian Models With R: October 2015
No ratings yet
Learning Bayesian Models With R: October 2015
2 pages
Bayesian Compendium One-Click Download
No ratings yet
Bayesian Compendium One-Click Download
17 pages
Main
No ratings yet
Main
195 pages
Learning Probabilistic Graphical Models in R - Sample Chapter
No ratings yet
Learning Probabilistic Graphical Models in R - Sample Chapter
37 pages
Mathematical Theory of Bayesian Statistics 1st Edition Sumio Watanabe download
No ratings yet
Mathematical Theory of Bayesian Statistics 1st Edition Sumio Watanabe download
62 pages
Naive Bayes Classifier
No ratings yet
Naive Bayes Classifier
14 pages
Mathematical Theory of Bayesian Statistics First Edition Watanabe - The ebook is ready for download to explore the complete content
100% (1)
Mathematical Theory of Bayesian Statistics First Edition Watanabe - The ebook is ready for download to explore the complete content
63 pages
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
No ratings yet
Probabilistic Models in Machine Learning: Unit - III Chapter - 1
18 pages
Bishop2008 Chapter ANewFrameworkForMachineLearnin
No ratings yet
Bishop2008 Chapter ANewFrameworkForMachineLearnin
24 pages
Learning Probabilistic Graphical Models in R
From Everand
Learning Probabilistic Graphical Models in R
David Bellot
No ratings yet
RESTful Web API Design With Node - Js - Second Edition - Sample Chapter
0% (1)
RESTful Web API Design With Node - Js - Second Edition - Sample Chapter
17 pages
Mastering Mesos - Sample Chapter
No ratings yet
Mastering Mesos - Sample Chapter
36 pages
Unity 5.x Game Development Blueprints - Sample Chapter
No ratings yet
Unity 5.x Game Development Blueprints - Sample Chapter
57 pages
Moodle 3.x Teaching Techniques - Third Edition - Sample Chapter
No ratings yet
Moodle 3.x Teaching Techniques - Third Edition - Sample Chapter
23 pages
Practical Digital Forensics - Sample Chapter
100% (3)
Practical Digital Forensics - Sample Chapter
31 pages
Mastering Drupal 8 Views - Sample Chapter
0% (1)
Mastering Drupal 8 Views - Sample Chapter
23 pages
Internet of Things With Python - Sample Chapter
100% (1)
Internet of Things With Python - Sample Chapter
34 pages
Python Geospatial Development - Third Edition - Sample Chapter
No ratings yet
Python Geospatial Development - Third Edition - Sample Chapter
32 pages
Modular Programming With Python - Sample Chapter
No ratings yet
Modular Programming With Python - Sample Chapter
28 pages
Android UI Design - Sample Chapter
No ratings yet
Android UI Design - Sample Chapter
47 pages
Flux Architecture - Sample Chapter
No ratings yet
Flux Architecture - Sample Chapter
25 pages
Mastering Hibernate - Sample Chapter
No ratings yet
Mastering Hibernate - Sample Chapter
27 pages
Angular 2 Essentials - Sample Chapter
0% (1)
Angular 2 Essentials - Sample Chapter
39 pages
Expert Python Programming - Second Edition - Sample Chapter
57% (7)
Expert Python Programming - Second Edition - Sample Chapter
40 pages
Puppet For Containerization - Sample Chapter
No ratings yet
Puppet For Containerization - Sample Chapter
23 pages
Practical Mobile Forensics - Second Edition - Sample Chapter
No ratings yet
Practical Mobile Forensics - Second Edition - Sample Chapter
38 pages
Cardboard VR Projects For Android - Sample Chapter
No ratings yet
Cardboard VR Projects For Android - Sample Chapter
57 pages
QGIS 2 Cookbook - Sample Chapter
100% (1)
QGIS 2 Cookbook - Sample Chapter
44 pages
Troubleshooting NetScaler - Sample Chapter
No ratings yet
Troubleshooting NetScaler - Sample Chapter
25 pages
Canvas Cookbook - Sample Chapter
No ratings yet
Canvas Cookbook - Sample Chapter
34 pages
Apache Hive Cookbook - Sample Chapter
100% (1)
Apache Hive Cookbook - Sample Chapter
27 pages
RStudio For R Statistical Computing Cookbook - Sample Chapter
100% (1)
RStudio For R Statistical Computing Cookbook - Sample Chapter
38 pages
Odoo Development Cookbook - Sample Chapter
100% (1)
Odoo Development Cookbook - Sample Chapter
35 pages
3D Printing Designs: Design An SD Card Holder - Sample Chapter
100% (1)
3D Printing Designs: Design An SD Card Holder - Sample Chapter
16 pages
Practical Linux Security Cookbook - Sample Chapter
100% (1)
Practical Linux Security Cookbook - Sample Chapter
25 pages
Sass and Compass Designer's Cookbook - Sample Chapter
No ratings yet
Sass and Compass Designer's Cookbook - Sample Chapter
41 pages
Sitecore Cookbook For Developers - Sample Chapter
No ratings yet
Sitecore Cookbook For Developers - Sample Chapter
34 pages
Machine Learning in Java - Sample Chapter
100% (1)
Machine Learning in Java - Sample Chapter
26 pages
The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation
No ratings yet
The Bayesian Brain: The Role of Uncertainty in Neural Coding and Computation
8 pages
G.C. Calafiore (Politecnico Di Torino)
No ratings yet
G.C. Calafiore (Politecnico Di Torino)
39 pages
Stan Reference 2.6.0
No ratings yet
Stan Reference 2.6.0
506 pages
Cumulative Prospect Theory and Decision Making Under Time Pressure
No ratings yet
Cumulative Prospect Theory and Decision Making Under Time Pressure
44 pages
2.6 Uncertainity and Bay's Theorem
No ratings yet
2.6 Uncertainity and Bay's Theorem
84 pages
Full download An Introduction to Quantitative Ecology Timothy E. Essington pdf docx
100% (3)
Full download An Introduction to Quantitative Ecology Timothy E. Essington pdf docx
66 pages
Defect Detection of Composite Material
No ratings yet
Defect Detection of Composite Material
14 pages
Probability and Statistics Third Edition Morris H. Degroot - Download the ebook now and own the full detailed content
100% (3)
Probability and Statistics Third Edition Morris H. Degroot - Download the ebook now and own the full detailed content
47 pages
AIML UNIT 2
No ratings yet
AIML UNIT 2
22 pages
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING QUESTION BANK (1)
No ratings yet
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING QUESTION BANK (1)
23 pages
2MARKS QUESTION BANK-1
No ratings yet
2MARKS QUESTION BANK-1
27 pages
Adaptive Trading Strategies Across Liquidity Pools
0% (1)
Adaptive Trading Strategies Across Liquidity Pools
47 pages
(Ebook) Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences by Sik-Yum Lee, Xin-Yuan Song ISBN 9780470669525, 0470669527 - The ebook is ready for download to explore the complete content
No ratings yet
(Ebook) Basic and Advanced Bayesian Structural Equation Modeling: With Applications in the Medical and Behavioral Sciences by Sik-Yum Lee, Xin-Yuan Song ISBN 9780470669525, 0470669527 - The ebook is ready for download to explore the complete content
60 pages
4.3.stereo-app
No ratings yet
4.3.stereo-app
70 pages
VTU OLD QP@AzDOCUMENTS - in
No ratings yet
VTU OLD QP@AzDOCUMENTS - in
18 pages
Bayesian Inference
No ratings yet
Bayesian Inference
380 pages
Nascimento Et Al. 2017 A Biologit S Guide To Bayesian Phylogenetic Analysis
No ratings yet
Nascimento Et Al. 2017 A Biologit S Guide To Bayesian Phylogenetic Analysis
21 pages
Sample Size and Its Determination
No ratings yet
Sample Size and Its Determination
13 pages
Calculation of Quantities of Spare Parts and The Estimation of Availability in The Repaired As Old Models
No ratings yet
Calculation of Quantities of Spare Parts and The Estimation of Availability in The Repaired As Old Models
8 pages
Ilya O. Ryzhov Awais Tariq Warren B. Powell Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA
No ratings yet
Ilya O. Ryzhov Awais Tariq Warren B. Powell Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA
12 pages
cs189 Lecture 1
No ratings yet
cs189 Lecture 1
113 pages
Nonlinear Bayesian Estimation: From Kalman Filtering To A Broader Horizon
No ratings yet
Nonlinear Bayesian Estimation: From Kalman Filtering To A Broader Horizon
17 pages
Minimax
No ratings yet
Minimax
26 pages
Literature Review On Population Education
100% (2)
Literature Review On Population Education
5 pages
Ranking DHI Attributes For Effective Prospect Risk Assessment Applied To The Otway Basin, Australia
No ratings yet
Ranking DHI Attributes For Effective Prospect Risk Assessment Applied To The Otway Basin, Australia
8 pages
Ebooks File From Statistical Physics To Data-Driven Modelling Simona Cocco All Chapters
No ratings yet
Ebooks File From Statistical Physics To Data-Driven Modelling Simona Cocco All Chapters
49 pages
CISCON2024PAPAER522
No ratings yet
CISCON2024PAPAER522
8 pages
Bayesian Inference
No ratings yet
Bayesian Inference
4 pages
Mastering pandas 1st Edition Femi Anthony 2024 Scribd Download
100% (1)
Mastering pandas 1st Edition Femi Anthony 2024 Scribd Download
50 pages

Learning Bayesian Models With R - Sample Chapter

Uploaded by

Learning Bayesian Models With R - Sample Chapter

Uploaded by

Fr

Bayesian inference provides a unified framework to deal

Who this book is written for

How machine learning models are built

Understand Bayesian models for

community experience distilled

State-of-the-art R packages for Bayesian

Dr. Hari M. Koduvely

This book is intended for data scientists who analyze large

What you will learn from this book

Learning Bayesian Models with R

Prices do not include

Visit www.PacktPub.com for books, eBooks,

Dr. Hari M. Koduvely

In this package, you will find:

The author biography

About the Author

Samsung R&D Institute in Bangalore, India. He has a PhD in statistical physics

What this book covers

Chapter 8, Bayesian Neural Networks, presents an important class of machine learning

Introducing the Probability

Introducing the Probability Theory

Here, N A is the number of occurrences of the event A .

Probability mass function (pmf) for discrete numerical random variables

Categorical distribution for categorical random variables

Probability density function (pdf) for continuous random variables

One of the well-known distribution functions is the normal or Gaussian distribution,

Introducing the Probability Theory

Many natural datasets follow normal distribution because, according to the

Here, corresponds to the mean (also called location) and is an N x N covariance

If = 0 , then the two-dimensional normal distribution will be reduced to the

Introducing the Probability Theory

The following Venn diagram explains the concept more clearly:

In Bayesian inference, we are interested in conditional probabilities corresponding

In the case of two-dimensional normal distribution, the conditional probability of

Introducing the Probability Theory

The details of this integration is given as an exercise (exercise 3 in the Exercises

Expectations and covariance

Having known the distribution of a set of random variables x = { x1 , x2 , , xN } ,

For example, in the case of two-dimensional normal distribution, if we are interested

= x1 x2 N ( x1 , x2 ) dx1dx2 x1 N ( x1 ) dx1 x2 N ( x2 ) dx2

Introducing the Probability Theory

Introducing the Probability Theory

Here, x1 , x2 , , xK 1 > 0 , x1 + x2 + + xK 1 < 1 , and xK = 1 x1 xK 1 .

Here, X denotes the determinant of the matrix X of dimension p p and n p is

By using the definition of conditional probability, show that the conditional

3. By using explicit integration of the expression in exercise 2, show that the

Introducing the Probability Theory

Answer the following questions:

Get more information Learning Bayesian Models with R

Where to buy this book

You might also like