Explore 1.5M+ audiobooks & ebooks free for days

Only $12.99 CAD/month after trial. Cancel anytime.

Introduction to Bayesian Statistics
Introduction to Bayesian Statistics
Introduction to Bayesian Statistics
Ebook1,107 pages9 hours

Introduction to Bayesian Statistics

Rating: 0 out of 5 stars

()

Read preview

About this ebook

"...this edition is useful and effective in teaching Bayesian inference at both elementary and intermediate levels. It is a well-written book on elementary Bayesian inference, and the material is easily accessible. It is both concise and timely, and provides a good collection of overviews and reviews of important tools used in Bayesian statistical methods."

There is a strong upsurge in the use of Bayesian methods in applied statistical analysis, yet most introductory statistics texts only present frequentist methods. Bayesian statistics has many important advantages that students should learn about if they are going into fields where statistics will be used. In this third Edition, four newly-added chapters address topics that reflect the rapid advances in the field of Bayesian statistics. The authors continue to provide a Bayesian treatment of introductory statistical topics, such as scientific data gathering, discrete random variables, robust Bayesian methods, and Bayesian approaches to inference for discrete random variables, binomial proportions, Poisson, and normal means, and simple linear regression. In addition, more advanced topics in the field are presented in four new chapters: Bayesian inference for a normal with unknown mean and variance; Bayesian inference for a Multivariate Normal mean vector; Bayesian inference for the Multiple Linear Regression Model; and Computational Bayesian Statistics including Markov Chain Monte Carlo. The inclusion of these topics will facilitate readers' ability to advance from a minimal understanding of Statistics to the ability to tackle topics in more applied, advanced level books. Minitab macros and R functions are available on the book's related website to assist with chapter exercises. Introduction to Bayesian Statistics, Third Edition also features:

  • Topics including the Joint Likelihood function and inference using independent Jeffreys priors and join conjugate prior
  • The cutting-edge topic of computational Bayesian Statistics in a new chapter, with a unique focus on Markov Chain Monte Carlo methods
  • Exercises throughout the book that have been updated to reflect new applications and the latest software applications
  • Detailed appendices that guide readers through the use of R and Minitab software for Bayesian analysis and Monte Carlo simulations, with all related macros available on the book's website

Introduction to Bayesian Statistics, Third Edition is a textbook for upper-undergraduate or first-year graduate level courses on introductory statistics course with a Bayesian emphasis. It can also be used as a reference work for statisticians who require a working knowledge of Bayesian statistics.

LanguageEnglish
PublisherWiley
Release dateSep 2, 2016
ISBN9781118593226
Introduction to Bayesian Statistics

Related to Introduction to Bayesian Statistics

Related ebooks

Mathematics For You

View More

Reviews for Introduction to Bayesian Statistics

Rating: 0 out of 5 stars
0 ratings

0 ratings0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Introduction to Bayesian Statistics - William M. Bolstad

    PREFACE

    Our original goal for this book was to introduce Bayesian statistics at the earliest possible stage to students with a reasonable mathematical background. This entailed coverage of a similar range of topics as an introductory statistics text, but from a Bayesian perspective. The emphasis is on statistical inference. We wanted to show how Bayesian methods can be used for inference and how they compare favorably with the frequentist alternatives. This book is meant to be a good place to start the study of Bayesian statistics. From the many positive comments we have received from many users, we think the book succeeded in its goal. A course based on this goal would include Chapters 1-14.

    Our feedback also showed that many users were taking up the book at a more intermediate level instead of the introductory level original envisaged. The topics covered in Chapters 2 and 3 would be old hat for these users, so we would have to include some more advanced material to cater for the needs of that group. The second edition aimed to meet this new goal as well as the original goal. We included more models, mainly with a single parameter. Nuisance parameters were dealt with using approximations. A course based on this goal would include Chapters 4-16.

    Changes in the Third Edition

    Later feedback showed that some readers with stronger mathematical and statistical background wanted the text to include more details on how to deal with multi-parameter models. The third edition contains four new chapters to satisfy this additional goal, along with some minor rewriting of the existing chapters. Chapter 17 covers Bayesian inference for Normal observations where we do not know either the mean or the variance. This chapter extends the ideas in Chapter 11, and also discusses the two sample case, which in turn allows the reader to consider inference on the difference between two means. Chapter 18 introduces the Multivariate Normal distribution, which we need in order to discuss multiple linear regression in Chapter 19. Finally, Chapter 20 takes the user beyond the kind of conjugate analysis is considered in most of the book, and into the realm of computational Bayesian inference. The covered topics in Chapter 20 have an intentional light touch, but still give the user valuable information and skills that will allow them to deal with different problems. We have included some new exercises and new computer exercises which use new Minitab macros and R-functions. The Minitab macros can be downloaded from the book website: http://introbayes.ac.nz. The new R functions have been incorporated in a new and improved version of the R package Bolstad, which can either be downloaded from a CRAN mirror or installed directly in R using the internet. Instructions on the use and installation of the Minitab macros and the Bolstad package in R are given in Appendices C and D respectively. Both of these appendices have been rewritten to accommodate changes in R and Minitab that have occurred since the second edition.

    Our Perspective on Bayesian Statistics

    A book can be characterized as much by what is left out as by what is included. This book is our attempt to show a coherent view of Bayesian statistics as a good way to do statistical inference. Details that are outside the scope of the text are included in footnotes. Here are some of our reasons behind our choice of the topics we either included or excluded.

    In particular, we did not mention decision theory or loss functions when discussing Bayesian statistics. In many books, Bayesian statistics gets compartmentalized into decision theory while inference is presented in the frequentist manner. While decision theory is a very interesting topic in its own right, we want to present the case for Bayesian statistical inference, and did not want to get side-tracked.

    We think that in order to get full benefit of Bayesian statistics, one really has to consider all priors subjective. They are either (1) a summary of what you believe or (2) a summary of all you allow yourself to believe initially. We consider the subjective prior as the relative weights given to each possible parameter value, before looking at the data. Even if we use a at prior to give all possible values equal prior weight, it is subjective since we chose it. In any case, it gives all values equal weight only in that parameterization, so it can be considered objective only in that parameterization. In this book we do not wish to dwell on the problems associated with trying to be objective in Bayesian statistics. We explain why universal objectivity is not possible (in a footnote since we do not want to distract the reader). We want to leave him/her with the relative weight idea of the prior in the parameterization in which they have the problem in.

    In the first edition we did not mention Jeffreys' prior explicitly, although the beta prior for binomial and at prior for normal mean are in fact the Jeffreys' prior for those respective observation distributions. In the second edition we do mention Jeffreys' prior for binomial, Poisson, normal mean, and normal standard deviation. In third edition we mention the independent Jeffreys priors for normal mean and standard deviation. In particular, we don't want to get the reader involved with the problems about Jeffreys' prior, such as for mean and variance together, as opposed to independent Jeffreys' priors, or the Jeffreys' prior violating the likelihood principal. These are beyond the level we wish to go. We just want the reader to note the Jeffreys' prior in these cases as possible priors, the relative weights they give, when they may be appropriate, and how to use them. Mathematically, all parameterizations are equally valid; however, usually only the main one is very meaningful. We want the reader to focus on relative weights for their parameterization as the prior. It should be (a) a summary of their prior belief (conjugate prior matching their prior beliefs about moments or median), (b) at (hence objective) for their parameterization, or (c) some other form that gives reasonable weight over the whole range of possible values. The posteriors will be similar for all priors that have reasonable weight over the whole range of possible values.

    The Bayesian inference on the standard deviation of the normal was done where the mean is considered a known parameter. The conjugate prior for the variance is the inverse chi-squared distribution. Our intuition is about the standard deviation, yet we are doing Bayes' theorem on the variance. This required introducing the change of variable formula for the prior density.

    In the second edition we considered the mean as known. This avoided the mathematically more advanced case where both mean and standard deviation are unknown. In the third edition we now cover this topic in Chapter 17. In earlier editions the Student's t is presented as the required adjustment to credible intervals for the mean when the variance is estimated from the data. In the third edition we show in Chapter 17 that in fact this would be the result when the joint posterior found, and the variance marginalized out. Chapter 17 also covers inference on the difference in two means. This problem is made substantially harder when one relaxes the assumption that both populations have the same variance. Chapter 17 derives the Bayesian solution to the well-known Behrens-Fisher problem for the difference in two population means with unequal population variances. The function bayes.t.test in the R package for this book actually gives the user a numerical solution using Gibbs sampling. Gibbs sampling is covered in Chapter 20 of this new edition.

    Acknowledgments

    WMB would like to thank all the readers who have sent him comments and pointed out misprints in the first and second editions. These have been corrected. WMB would like to thank Cathy Akritas and Gonzalo Ovalles at Minitab for help in improving his Minitab macros. WMB and JMC would like to thank Jon Gurstelle, Steve Quigley, Sari Friedman, Allison McGinniss, and the team at John Wiley & Sons for their support.

    Finally, last but not least, WMB wishes to thank his wife Sylvie for her constant love and support.

    WILLIAM M. "BILL' BOLSTAD

    Hamilton, New Zealand

    JAMES M. CURRAN

    Auckland, New Zealand

    CHAPTER 1

    INTRODUCTION TO STATISTICAL SCIENCE

    Statistics is the science that relates data to specific questions of interest. This includes devising methods to gather data relevant to the question, methods to summarize and display the data to shed light on the question, and methods that enable us to draw answers to the question that are supported by the data. Data almost always contain uncertainty. This uncertainty may arise from selection of the items to be measured, or it may arise from variability of the measurement process. Drawing general conclusions from data is the basis for increasing knowledge about the world, and is the basis for all rational scientific inquiry. Statistical inference gives us methods and tools for doing this despite the uncertainty in the data. The methods used for analysis depend on the way the data were gathered. It is vitally important that there is a probability model explaining how the uncertainty gets into the data.

    Showing a Causal Relationship from Data

    Suppose we have observed two variables X and Y. Variable X appears to have an association with variable Y. If high values of X occur with high values of variable Y and low values of X occur with low values of Y, then we say the association is positive. On the other hand, the association could be negative in which high values of variable X occur in with low values of variable Y. Figure 1.1 shows a schematic diagram where the association is indicated by the dashed curve connecting X and Y. The unshaded area indicates that X and Y are observed variables. The shaded area indicates that there may be additional variables that have not been observed.

    Image described by surrounding text and caption.

    Figure 1.1 Association between two variables.

    Diagram shows variables in the unshaded area X and Y connected by a dashed curve and an arrow from X to Y depicting the causal relationship.

    Figure 1.2 Association due to causal relationship.

    We would like to determine why the two variables are associated. There are several possible explanations. The association might be a causal one. For example, X might be the cause of Y. This is shown in Figure 1.2, where the causal relationship is indicated by the arrow from X to Y.

    On the other hand, there could be an unidentified third variable Z that has a causal effect on both X and Y. They are not related in a direct causal relationship. The association between them is due to the effect of Z. Z is called a lurking variable, since it is hiding in the background and it affects the data. This is shown in Figure 1.3.

    Diagram shows variables in the unshaded area X and Y connected by a dashed curve and arrows from lurking variable Z in the shaded region to X and Y.

    Figure 1.3 Association due to lurking variable.

    Diagram shows variables in the unshaded area X and Y connected by a dashed curve, an arrow from X to Y and arrows from lurking variable Z in the shaded region to X and Y.

    Figure 1.4 Confounded causal and lurking variable effects.

    It is possible that both a causal effect and a lurking variable may both be contributing to the association. This is shown in Figure 1.4. We say that the causal effect and the effect of the lurking variable are confounded. This means that both effects are included in the association.

    Our first goal is to determine which of the possible reasons for the association holds. If we conclude that it is due to a causal effect, then our next goal is to determine the size of the effect. If we conclude that the association is due to causal effect confounded with the effect of a lurking variable, then our next goal becomes determining the sizes of both the effects.

    1.1 The Scientific Method: A Process for Learning

    In the Middle Ages, science was deduced from principles set down many centuries earlier by authorities such as Aristotle. The idea that scientific theories should be tested against real world data revolutionized thinking. This way of thinking known as the scientific method sparked the Renaissance.

    The scientific method rests on the following premises:

    A scientific hypothesis can never be shown to be absolutely true.

    However, it must potentially be disprovable.

    It is a useful model until it is established that it is not true.

    Always go for the simplest hypothesis, unless it can be shown to be false.

    This last principle, elaborated by William of Ockham in the 13th century, is now known as Ockham’s razor and is firmly embedded in science. It keeps science from developing fanciful overly elaborate theories. Thus the scientific method directs us through an improving sequence of models, as previous ones get falsified. The scientific method generally follows the following procedure:

    Ask a question or pose a problem in terms of the current scientific hypothesis.

    Gather all the relevant information that is currently available. This includes the current knowledge about parameters of the model.

    Design an investigation or experiment that addresses the question from step 1. The predicted outcome of the experiment should be one thing if the current hypothesis is true, and something else if the hypothesis is false.

    Gather data from the experiment.

    Draw conclusions given the experimental results. Revise the knowledge about the parameters to take the current results into account.

    The scientific method searches for cause-and-effect relationships between an experimental variable and an outcome variable. In other words, how changing the experimental variable results in a change to the outcome variable. Scientific modeling develops mathematical models of these relationships. Both of them need to isolate the experiment from outside factors that could affect the experimental results. All outside factors that can be identified as possibly affecting the results must be controlled. It is no coincidence that the earliest successes for the method were in physics and chemistry where the few outside factors could be identified and controlled. Thus there were no lurking variables. All other relevant variables could be identified and could then be physically controlled by being held constant. That way they would not affect results of the experiment, and the effect of the experimental variable on the outcome variable could be determined. In biology, medicine, engineering, technology, and the social sciences it is not that easy to identify the relevant factors that must be controlled. In those fields a different way to control outside factors is needed, because they cannot be identified beforehand and physically controlled.

    1.2 The Role of Statistics in the Scientific Method

    Statistical methods of inference can be used when there is random variability in the data. The probability model for the data is justified by the design of the investigation or experiment. This can extend the scientific method into situations where the relevant outside factors cannot even be identified. Since we cannot identify these outside factors, we cannot control them directly. The lack of direct control means the outside factors will be affecting the data. There is a danger that the wrong conclusions could be drawn from the experiment due to these uncontrolled outside factors.

    The important statistical idea of randomization has been developed to deal with this possibility. The unidentified outside factors can be averaged out by randomly assigning each unit to either treatment or control group. This contributes variability to the data. Statistical conclusions always have some uncertainty or error due to variability in the data. We can develop a probability model of the data variability based on the randomization used. Randomization not only reduces this uncertainty due to outside factors, it also allows us to measure the amount of uncertainty that remains using the probability model. Randomization lets us control the outside factors statistically, by averaging out their effects.

    Underlying this is the idea of a statistical population, consisting of all possible values of the observations that could be made. The data consists of observations taken from a sample of the population. For valid inferences about the population parameters from the sample statistics, the sample must be representative of the population. Amazingly, choosing the sample randomly is the most effective way to get representative samples!

    1.3 Main Approaches to Statistics

    There are two main philosophical approaches to statistics. The first is often referred to as the frequentist approach. Sometimes it is called the classical approach. Procedures are developed by looking at how they perform over all possible random samples. The probabilities do not relate to the particular random sample that was obtained. In many ways this indirect method places the cart before the horse.

    The alternative approach that we take in this book is the Bayesian approach. It applies the laws of probability directly to the problem. This offers many fundamental advantages over the more commonly used frequentist approach. We will show these advantages over the course of the book.

    Frequentist Approach to Statistics

    Most introductory statistics books take the frequentist approach to statistics, which is based on the following ideas:

    Parameters, the numerical characteristics of the population, are fixed but unknown constants.

    Probabilities are always interpreted as long-run relative frequency.

    Statistical procedures are judged by how well they perform in the long run over an infinite number of hypothetical repetitions of the experiment.

    Probability statements are only allowed for random quantities. The unknown parameters are fixed, not random, so probability statements cannot be made about their value. Instead, a sample is drawn from the population, and a sample statistic is calculated. The probability distribution of the statistic over all possible random samples from the population is determined and is known as the sampling distribution of the statistic. A parameter of the population will also be a parameter of the sampling distribution. The probability statement that can be made about the statistic based on its sampling distribution is converted to a confidence statement about the parameter. The confidence is based on the average behavior of the procedure over all possible samples.

    Bayesian Approach to Statistics

    The Reverend Thomas Bayes first discovered the theorem that now bears his name. It was written up in a paper An Essay Towards Solving a Problem in the Doctrine of Chances. This paper was found after his death by his friend Richard Price, who had it published posthumously in the Philosophical Transactions of the Royal Society in 1763 (1763). Bayes showed how inverse probability could be used to calculate probability of antecedent events from the occurrence of the consequent event. His methods were adopted by Laplace and other scientists in the 19th century, but had largely fallen from favor by the early 20th century. By the middle of the 20th century, interest in Bayesian methods had been renewed by de Finetti, Jeffreys, Savage, and Lindley, among others. They developed a complete method of statistical inference based on Bayes’ theorem.

    This book introduces the Bayesian approach to statistics. The ideas that form the basis of the this approach are:

    Since we are uncertain about the true value of the parameters, we will consider them to be random variables.

    The rules of probability are used directly to make inferences about the parameters.

    Probability statements about parameters must be interpreted as degree of belief. The prior distribution must be subjective. Each person can have his/her own prior, which contains the relative weights that person gives to every possible parameter value. It measures how plausible the person considers each parameter value to be before observing the data.

    We revise our beliefs about parameters after getting the data by using Bayes’ theorem. This gives our posterior distribution which gives the relative weights we give to each parameter value after analyzing the data. The posterior distribution comes from two sources: the prior distribution and the observed data.

    This has a number of advantages over the conventional frequentist approach. Bayes’ theorem is the only consistent way to modify our beliefs about the parameters given the data that actually occurred. This means that the inference is based on the actual occurring data, not all possible data sets that might have occurred but did not! Allowing the parameter to be a random variable lets us make probability statements about it, posterior to the data. This contrasts with the conventional approach where inference probabilities are based on all possible data sets that could have occurred for the fixed parameter value. Given the actual data, there is nothing random left with a fixed parameter value, so one can only make confidence statements, based on what could have occurred. Bayesian statistics also has a general way of dealing with a nuisance parameter. A nuisance parameter is one which we do not want to make inference about, but we do not want them to interfere with the inferences we are making about the main parameters. Frequentist statistics does not have a general procedure for dealing with them. Bayesian statistics is predictive, unlike conventional frequentist statistics. This means that we can easily find the conditional probability distribution of the next observation given the sample data.

    Monte Carlo Studies

    In frequentist statistics, the parameter is considered a fixed, but unknown, constant. A statistical procedure such as a particular estimator for the parameter cannot be judged from the value it gives. The parameter is unknown, so we can not know the value the estimator should be giving. If we knew the value of the parameter, we would not be using an estimator.

    Instead, statistical procedures are evaluated by looking how they perform in the long run over all possible samples of data, for fixed parameter values over some range. For instance, we fix the parameter at some value. The estimator depends on the random sample, so it is considered a random variable having a probability distribution. This distribution is called the sampling distribution of the estimator, since its probability distribution comes from taking all possible random samples. Then we look at how the estimator is distributed around the parameter value. This is called sample space averaging. Essentially it compares the performance of procedures before we take any data.

    Bayesian procedures consider the parameter to be a random variable, and its posterior distribution is conditional on the sample data that actually occurred, not all those samples that were possible but did not occur. However, before the experiment, we might want to know how well the Bayesian procedure works at some specific parameter values in the range.

    To evaluate the Bayesian procedure using sample space averaging, we have to consider the parameter to be both a random variable and a fixed but unknown value at the same time. We can get past the apparent contradiction in the nature of the parameter because the probability distribution we put on the parameter measures our uncertainty about the true value. It shows the relative belief weights we give to the possible values of the unknown parameter! After looking at the data, our belief distribution over the parameter values has changed. This way we can think of the parameter as a fixed, but unknown, value at the same time as we think of it being a random variable. This allows us to evaluate the Bayesian procedure using sample space averaging. This is called pre-posterior analysis because it can be done before we obtain the data.

    In Chapter 4, we will find out that the laws of probability are the best way to model uncertainty. Because of this, Bayesian procedures will be optimal in the post-data setting, given the data that actually occurred. In Chapters 9 and 11, we will see that Bayesian procedures perform very well in the pre-data setting when evaluated using pre-posterior analysis. In fact, it is often the case that Bayesian procedures outperform the usual frequentist procedures even in the pre-data setting.

    Monte Carlo studies are a useful way to perform sample space averaging. We draw a large number of samples randomly using the computer and calculate the statistic (frequentist or Bayesian) for each sample. The empirical distribution of the statistic (over the large number of random samples) approximates its sampling distribution (over all possible random samples). We can calculate statistics such as mean and standard deviation on this Monte Carlo sample to approximate the mean and standard deviation of the sampling distribution. Some small-scale Monte Carlo studies are included as exercises.

    1.4 Purpose and Organization of This Text

    A very large proportion of undergraduates are required to take a service course in statistics. Almost all of these courses are based on frequentist ideas. Most of them do not even mention Bayesian ideas. As a statistician, I know that Bayesian methods have great theoretical advantages. I think we should be introducing our best students to Bayesian ideas, from the beginning. There are not many introductory statistics text books based on the Bayesian ideas. Some other texts include Berry (1996), Press (1989), and Lee (1989).

    This book aims to introduce students with a good mathematics background to Bayesian statistics. It covers the same topics as a standard introductory statistics text, only from a Bayesian perspective. Students need reasonable algebra skills to follow this book. Bayesian statistics uses the rules of probability, so competence in manipulating mathematical formulas is required. Students will find that general knowledge of calculus is helpful in reading this book. Specifically they need to know that area under a curve is found by integrating, and that a maximum or minimum of a continuous differentiable function is found where the derivative of the function equals zero. However, the actual calculus used is minimal. The book is self-contained with a calculus appendix that students can refer to.

    Chapter 2 introduces some fundamental principles of scientific data gathering to control the effects of unidentified factors. These include the need for drawing samples randomly, along with some random sampling techniques. The reason why there is a difference between the conclusions we can draw from data arising from an observational study and from data arising from a randomized experiment is shown. Completely randomized designs and randomized block designs are discussed.

    Chapter 3 covers elementary methods for graphically displaying and summarizing data. Often a good data display is all that is necessary. The principles of designing displays that are true to the data are emphasized.

    Chapter 4 shows the difference between deduction and induction. Plausible reasoning is shown to be an extension of logic where there is uncertainty. It turns out that plausible reasoning must follow the same rules as probability. The axioms of probability are introduced and the rules of probability, including conditional probability and Bayes’ theorem are developed.

    Chapter 5 covers discrete random variables, including joint and marginal discrete random variables. The binomial, hypergeometric, and Poisson distributions are introduced, and the situations where they arise are characterized.

    Chapter 6 covers Bayes’ theorem for discrete random variables using a table. We see that two important consequences of the method are that multiplying the prior by a constant, or that multiplying the likelihood by a constant do not affect the resulting posterior distribution. This gives us the proportional form of Bayes’ theorem. We show that we get the same results when we analyze the observations sequentially using the posterior after the previous observation as the prior for the next observation, as when we analyze the observations all at once using the joint likelihood and the original prior. We demonstrate Bayes’ theorem for binomial observations with a discrete prior and for Poisson observations with a discrete prior.

    Chapter 7 covers continuous random variables, including joint, marginal, and conditional random variables. The beta, gamma, and normal distributions are introduced in this chapter.

    Chapter 8 covers Bayes’ theorem for the population proportion (binomial) with a continuous prior. We show how to find the posterior distribution of the population proportion using either a uniform prior or a beta prior. We explain how to choose a suitable prior. We look at ways of summarizing the posterior distribution.

    Chapter 9 compares the Bayesian inferences with the frequentist inferences. We show that the Bayesian estimator (posterior mean using a uniform prior) has better performance than the frequentist estimator (sample proportion) in terms of mean squared error over most of the range of possible values. This kind of frequentist analysis is useful before we perform our Bayesian analysis. We see the Bayesian credible interval has a much more useful interpretation than the frequentist confidence interval for the population proportion. Onesided and two-sided hypothesis tests using Bayesian methods are introduced.

    Chapter 10 covers Bayes’ theorem for the Poisson observations with a continuous prior. The prior distributions used include the positive uniform, the Jeffreys’ prior, and the gamma prior. Bayesian inference for the Poisson parameter using the resulting posterior include Bayesian credible intervals and two-sided tests of hypothesis, as well as one-sided tests of hypothesis.

    Chapter 11 covers Bayes’ theorem for the mean of a normal distribution with known variance. We show how to choose a normal prior. We discuss dealing with nuisance parameters by marginalization. The predictive density of the next observation is found by considering the population mean a nuisance parameter and marginalizing it out.

    Chapter 12 compares Bayesian inferences with the frequentist inferences for the mean of a normal distribution. These comparisons include point and interval estimation and involve hypothesis tests including both the one-sided and the two-sided cases.

    Chapter 13 shows how to perform Bayesian inferences for the difference between normal means and how to perform Bayesian inferences for the difference between proportions using the normal approximation.

    Chapter 14 introduces the simple linear regression model and shows how to perform Bayesian inferences on the slope of the model. The predictive distribution of the next observation is found by considering both the slope and intercept to be nuisance parameters and marginalizing them out.

    Chapter 15 introduces Bayesian inference for the standard deviation σ, when we have a random sample of normal observations with known mean μ. This chapter is at a somewhat higher level than the previous chapters and requires the use of the change-of-variable formula for densities. Priors used include positive uniform for standard deviation, positive uniform for variance, Jeffreys’ prior, and the inverse chi-squared prior. We discuss how to choose an inverse chi-squared prior that matches our prior belief about the median. Bayesian inferences from the resulting posterior include point estimates, credible intervals, and hypothesis tests including both the one-sided and two-sided cases.

    Chapter 16 shows how we can make Bayesian inference robust against a misspecified prior by using a mixture prior and marginalizing out the mixture parameter. This chapter is also at a somewhat higher level than the others, but it shows how one of the main dangers of Bayesian analysis can be avoided.

    Chapter 17 returns to the problem we discussed in Chapter 11 — that is, of making inferences about the mean of a normal distribution. In this chapter, however, we explicitly model the unknown population standard deviation and show how the approximations we suggested in Chapter 11 are exactly true. We also deal with the two sample cases so that inference can be performed on the difference between two means.

    Chapter 18 introduces the multivariate normal distribution and extends the theory from Chapters 11 and 17 to the multivariate case. The multivariate normal distribution is essential for the discussion of linear models and, in particular, multiple regression.

    Chapter 19 extends the material from 14 on simple linear regression to the more familiar multiple regression setting. The methodology for making inference about the usefulness of explanatory variables in predicting the response is given, and the posterior predictive distribution for a new observation is derived.

    Chapter 20 provides a brief introduction to modern computational Bayesian statistics. Computational Bayesian statistics relies heavily on being able to efficiently sample from potentially complex distributions. This chapter gives an introduction to a number of techniques that are used. Readers might be slightly disappointed that we did not cover popular computer programs such as BUGS and JAGS, which have very efficient general implementations of many computational Bayesian methods and tie in well to R. We felt that these topics require almost an entire book in their own right, and as such we could not do justice to them in such a short space.

    Main Points

    An association between two variables does not mean that one causes the other. It may be due to a causal relationship, it may be due to the effect of a third (lurking) variable on both the other variables, or it may be due to a combination of a causal relationship and the effect of a lurking variable.

    Scientific method is a method for searching for cause-and-effect relationships and measuring their strength. It uses controlled experiments, where outside factors that may affect the measurements are controlled. This isolates the relationship between the two variables from the outside factors, so the relationship can be determined.

    Statistical methods extend the scientific method to cases where the outside factors are not identified and hence cannot be controlled. The principle of randomization is used to statistically control these unidentified outside factors by averaging out their effects. This contributes to variability in the data.

    We can use the probability model (based on the randomization method) to measure the uncertainty.

    The frequentist approach to statistics considers the parameter to be a fixed but unknown constant. The only kind of probability allowed is long-run relative frequency. These probabilities are only for observations and sample statistics, given the unknown parameters. Statistical procedures are judged by how they perform in an infinite number of hypothetical repetitions of the experiment.

    The Bayesian approach to statistics allows the parameter to be considered a random variable. Probabilities can be calculated for parameters as well as observations and sample statistics. Probabilities calculated for parameters are interpreted as degree of belief and must be subjective. The rules of probability are used to revise our beliefs about the parameters, given the data.

    A frequentist estimator is evaluated by looking at its sampling distribution for a fixed parameter value and seeing how it is distributed over all possible repetitions of the experiment.

    If we look at the sampling distribution of a Bayesian estimator for a fixed parameter value, it is called pre-posterior analysis since it can be done prior to taking the data.

    A Monte Carlo study is where we perform the experiment a large number of times and calculate the statistic for each experiment. We use the empirical distribution of the statistic over all the samples we took in our study instead of its sampling distribution over all possible repetitions.

    CHAPTER 2

    SCIENTIFIC DATA GATHERING

    Scientists gather data purposefully, in order to find answers to particular questions. Statistical science has shown that data should be relevant to the particular questions, yet be gathered using randomization. The development of methods to gather data purposefully, yet using randomization, is one of the greatest contributions the field of statistics has made to the practice of science.

    Variability in data solely due to chance can be averaged out by increasing the sample size. Variability due to other causes cannot be. Statistical methods have been developed for gathering data randomly, yet relevant to a specific question. These methods can be divided into two fields. Sample survey theory is the study of methods for sampling from a finite real population. Experimental design is the study of methods for designing experiments that focus on the desired factors and that are not affected by other possibly unidentified ones.

    Inferences always depend on the probability model which we assume generated the observed data being the correct one. When data are not gathered randomly, there is a risk that the observed pattern is due to lurking variables that were not observed, instead of being a true reflection of the underlying pattern. In a properly designed experiment, treatments are assigned to subjects in such a way as to reduce the effects of any lurking variables that are present, but unknown to us.

    When we make inferences from data gathered according to a properly designed random survey or experiment, the probability model for the observations follows from the design of the survey or experiment, and we can be confident that it is correct. This puts our inferences on a solid foundation. On the other hand, when we make inferences from data gathered from a non-random design, we do not have any underlying justification for the probability model, we just assume it is true! There is the possibility the assumed probability model for the observations is not correct, and our inferences will be on shaky ground.

    2.1 Sampling from a Real Population

    First, we will define some fundamental terms.

    Population. The entire group of objects or people the investigator wants information about. For instance, the population might consist of New Zealand residents over the age of eighteen. Usually we want to know some specific attribute about the population. Each member of the population has a number associated with it — for example, his/her annual income. Then we can consider the model population to be the set of numbers for each individual in the real population. Our model population would be the set of incomes of all New Zealand residents over the age of eighteen. We want to learn about the distribution of the population. Specifically, we want information about the population Parameters, which are numbers associated with the distribution of the population, such as the population mean, median, and standard deviation. Often it is not feasible to get information about all the units in the population. The population may be too big, or spread over too large an area, or it may cost too much to obtain data for the complete population. So we do not know the parameters because it is infeasible to calculate them.

    Sample. A subset of the population. The investigator draws one sample from the population and gets information from the individuals in that sample. Sample statistics are calculated from sample data. They are numerical characteristics that summarize the distribution of the sample, such as the sample mean, median, and standard deviation. A statistic has a similar relationship to a sample that a parameter has to a population. However, the sample is known, so the statistic can be calculated.

    Statistical inference. Making a statement about population parameters on basis of sample statistics. Good inferences can be made if the sample is representative of the population as a whole! The distribution of the sample must be similar to the distribution of the population from which it came! Sampling bias, a systematic tendency to collect a sample which is not representative of the population, must be avoided. It would cause the distribution of the sample to be dissimilar to that of the population, and thus lead to very poor inferences.

    Even if we are aware of something about the population and try to represent it in the sample, there is probably some other factors in the population that we are unaware of, and the sample would end up being nonrepresentative in those factors.

    EXAMPLE 2.1

    Suppose we are interested in estimating the proportion of Hamilton voters who approve the Hamilton City Council’s financing a new rugby stadium. We decide to go downtown one lunch break and draw our sample from people passing by. We might decide that our sample should be balanced between males and females the same as the voting age population. We might get a sample evenly balanced between males and females, but not be aware that the people we interview during the day are mainly those on the street during working hours. Office workers would be overrepresented, while factory workers would be underrepresented. There might be other biases inherent in choosing our sample this way, and we might not have a clue as to what these biases are. Some groups would be systematically underrepresented, and others systematically overrepresented. We cannot make our sample representative for classifications we do not know.

    Surprisingly, random samples give more representative samples than any non-random method such as quota samples or judgment samples. They not only minimize the amount of error in the inference, they also allow a (probabilistic) measurement of the error that remains.

    Simple Random Sampling (without Replacement)

    Simple random sampling requires a sampling frame, which is a list of the population numbered from 1 to N. A sequence of n random numbers are drawn from the numbers 1 to N. Each time a number is drawn it is removed from consideration so that it cannot be drawn again. The items on the list corresponding to the chosen numbers are included in the sample. Thus, at each draw, each item not yet selected has an equal chance of being selected. Every item has equal chance of being in the final sample. Furthermore, every possible sample of the required size is equally likely.

    Suppose we are sampling from the population of registered voters in a large city. It is likely that the proportion of males in the sample is close to the proportion of males in the population. Most samples are near the correct proportions; however, we are not certain to get the exact proportion. All possible samples of size n are equally likely, including those that are not representative with respect to sex.

    Stratified Random Sampling

    Given that we know what the proportions of males and females are from the list of voters, we should take that information into account in our sampling method. In stratified random sampling, the population is divided into sub-populations called strata. In our case this would be males and females. The sampling frame would be divided into separate sampling frames for the two strata. A simple random sample is taken from each stratum where the sample size in each stratum is proportional to the stratum size. Every item has equal chance of being selected, and every possible sample that has each stratum represented in the correct proportions is equally likely. This method will give us samples that are exactly representative with respect to sex. Hence inferences from these type of samples will be more accurate than those from simple random sampling when the variable of interest has different distributions over the strata. If the variable of interest is the same for all the strata, then stratified random sampling will be no more (and no less) accurate than simple random sampling. Stratification has no potential downside as far as accuracy of the inference. However, it is more costly, as the sampling frame has to be divided into separate sampling frames for each stratum.

    Cluster Random Sampling

    Sometimes we do not have a good sampling frame of individuals. In other cases the individuals are scattered across a wide area. In cluster random sampling, we divide that area into neighborhoods called clusters. Then we make a sampling frame for clusters. A random sample of clusters is selected. All items in the chosen clusters are included in the sample. This is very cost effective because the interviewer will not have as much travel time between interviews. The drawback is that items in a cluster tend to be more similar than items in different clusters. For instance, people living in the same neighborhood usually come from the same economic level because the houses were built at the same time and in the same price range. This means that each observation gives less information about the population parameters. It is less efficient in terms of sample size. However, often it is very cost effective, because getting a larger sample is usually cheaper by this method.

    Non-sampling Errors in Sample Surveys

    Errors can arise in sample surveys or in a complete population census for reasons other than the sampling method used. These non-sampling errors include response bias; the people who respond may be somewhat different than those who do not respond. They may have different views on the matters surveyed. Since we only get observations from those who respond, this difference would bias the results. A well-planned survey will have callbacks, where those in the sample who have not responded will be contacted again, in order to get responses from as many people in the original sample as possible. This will entail additional costs, but is important as we have no reason to believe that nonrespondents have the same views as the respondents. Errors can also arise from poorly worded questions. Survey questions should be trialed in a pilot study to determine if there is any ambiguity.

    Randomized Response Methods

    Social science researchers and medical researchers often wish to obtain information about the population as a whole, but the information that they wish to obtain is sensitive to the individuals who are surveyed. For instance, the distribution of the number of sex partners over the whole population would be indicative of the overall population risk for sexually transmitted diseases. Individuals surveyed may not wish to divulge this sensitive personal information. They might refuse to respond or, even worse, they could give an untruthful answer. Either way, this would threaten the validity of the survey results. Randomized response methods have been developed to get around this problem. There are two questions, the sensitive question and the dummy question. Both questions have the same set of answers. The respondent uses a randomization that selects which question he or she answers, and also the answer if the dummy question is selected. Some of the answers in the survey data will be to the sensitive question and some will be to the dummy question. The interviewer will not know which is which. However, the incorrect answers are entering the data from known randomization probabilities. This way, information about the population can be obtained without actually knowing the personal information of the individuals surveyed, since only that individual knows which question he or she answered. Bolstad, Hunt, and McWhirter (2001) describe a Sex, Drugs, and Rock & Roll Survey that gets sensitive information about a population (Introduction to Statistics class) using randomized response methods.

    2.2 Observational Studies and Designed Experiments

    The goal of scientific inquiry is to gain new knowledge about the cause-and-effect relationship between a factor and a response variable. We gather data to help us determine these relationships and to develop mathematical models to explain them. The world is complicated. There are many other factors that may affect the response. We may not even know what these other factors are. If we do not know what they are, we cannot control them directly. Unless we can control them, we cannot make inferences about cause-and-effect relationships! Suppose, for example, we want to study a herbal medicine for its effect on weight loss. Each person in the study is an experimental unit. There is great variability between experimental units, because people are all unique individuals with their own hereditary body chemistry and dietary and exercise habits. The variation among experimental units makes it more difficult to detect the effect of a treatment. Figure 2.1 shows a collection of experimental units. The degree of shading shows they are not the same with respect to some unidentified variable. The response variable in the experiment may depend on that unidentified variable, which could be a lurking variable in the experiment.

    Diagram shows experimental units numbered from 1 to 24 are arranged in ascending order into seven rows. Each odd numbered and even numbered row contains three and four units respectively.

    Figure 2.1 Variation among experimental units.

    Observational Study

    If we record the data on a group of subjects that decided to take the herbal medicine and compared that with data from a control group who did not, that would be an observational study. The treatments have not been randomly assigned to treatment and control group. Instead they self-select. Even if we observe a substantial difference between the two groups, we cannot conclude that there is a causal relationship from an observational study. We cannot rule out that the association was due to an unidentified lurking variable. In our study, those who took the treatment may have been more highly motivated to lose weight than those who did not. Or there may be other factors that differed between the two groups. Any inferences we make on an observational study are dependent on the assumption that there are no differences between the distribution of the units assigned to the treatment groups and the control group. We cannot know whether this assumption is actually correct in an observational study.

    Designed Experiment

    We need to get our data from a designed experiment if we want to be able to make sound inferences about cause-and-effect relationships. The experimenter uses randomization to decide which subjects get into the treatment group(s) and control group respectively. For instance, he/she uses a table of random numbers, or flips a coin.

    We are going to divide the experimental units into four treatment groups (one of which may be a control group). We must ensure that each group gets a similar range of units. If we do not, we might end up attributing a difference between treatment groups to the different treatments, when in fact it was due to the lurking variable and a biased assignment of experimental units to treatment groups.

    Completely randomized

    Enjoying the preview?
    Page 1 of 1