100% found this document useful (6 votes)

88 views

Statistical Learning in Genetics An Introduction Using R Complete PDF Download

This book serves as an introduction to statistical learning in genetics, aimed at life-science PhD students and post-docs with a quantitative background. It covers both likelihood and Bayesian methods, providing detailed explanations, examples, and exercises primarily using the R programming language. The content is organized into three parts: fitting models, prediction techniques, and exercises with solutions, making it a comprehensive resource for those interested in genomic research.

Uploaded by

g.huynhgh.oangroongnam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (6 votes)

88 views

Statistical Learning in Genetics An Introduction Using R Complete PDF Download

Uploaded by

g.huynhgh.oangroongnam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Statistical Learning in Genetics An Introduction Using R

Visit the link below to download the full version of this book:

https://medipdf.com/product/statistical-learning-in-genetics-an-introduction-usi
ng-r/

Click Download Now

Daniel Sorensen
Aarhus University
Aarhus, Denmark

ISSN 1431-8776 ISSN 2197-5671 (electronic)

Statistics for Biology and Health
ISBN 978-3-031-35850-0 ISBN 978-3-031-35851-7 (eBook)
https://doi.org/10.1007/978-3-031-35851-7

© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

Paper in this product is recyclable.

Til min elskede Pia
Preface

This book evolved from a set of notes written for a graduate course on Likelihood
and Bayesian Computations held at Aarhus University in 2016 and 2018. The
audience was life-science PhD students and post-docs with a background in either
biology, agriculture, medicine or epidemiology, who wished to develop analytic
skills to perform genomic research. This book is addressed to this audience of
numerate biologists, who, despite an interest in quantitative methods, lack the
formal mathematical background of the professional statistician. For this reason,
I offer considerably more detail in explanations and derivations than may be needed
for a more mathematically oriented audience. Nevertheless, some mathematical
and statistical prerequisites are needed in order to extract maximum benefit from
the book. These include introductory courses on calculus, linear algebra and
mathematical statistics, as well as a grounding in linear and nonlinear regression and
mixed models. Applied statistics and biostatistics students may also find the book
useful, but may wish to browse hastily through the introductory chapters describing
likelihood and Bayesian methods.
I have endeavoured to write in a style that appeals to the quantitative biologist,
while remaining concise and using examples profusely. The intention is to cover
ground at a good pace, facilitating learning by interconnecting theory with examples
and providing exercises with their solutions. Many exercises involve programming
with the open-source package R, a statistical software that can be downloaded and
used with the free graphical user interface RStudio. Most of today’s students
are competent in R and there are many tutorials online for the uninitiated. The
R-code needed to solve the exercises is provided in all cases and is written,
with few exceptions, with the objective of being transparent rather than efficient.
The reader has the opportunity to run the codes and to modify input parameters
in an experimental fashion. This hands-on computing contributes to a better
understanding of the underlying theory.
The first objective of this introduction is to provide readers with an understanding
of the techniques used for analysis of data, with emphasis on genetic data. The
second objective is to teach them to implement these techniques. Meeting these
objectives is an initial step towards acquiring the skills needed to perform data-

vii
viii Preface

driven genetics/genomics research. Despite the focus on genetic applications, the

mathematics of the statistical models and their implementation are relevant for
many other branches of quantitative methods. An appendix in the opening chapter
provides an overview of basic quantitative genomic concepts, making the book more
accessible to an audience of "non-geneticists".
I have attempted to give a balanced account of frequentist/likelihood and
Bayesian methods. Both approaches are used in classical quantitative genetic and
modern genomic analyses and constitute essential ingredients in the toolkit of the
well-trained quantitative biologist.
The book is organised in three parts. Part I (Chaps. 2–5) presents an overview
of likelihood and Bayesian inference. Chapter 2 introduces the basic elements
of the likelihood paradigm, including the likelihood function, the score and the
maximum likelihood estimator. Properties of the maximum likelihood estimator are
summarised and several examples illustrate the construction of simple likelihood
models, the derivation of the maximum likelihood estimators and their properties.
Chapter 3 provides a review of three computational methods for fitting likelihood
models: Newton-Raphson, the EM (expectation-maximisation) algorithm and gra-
dient descent. After a brief description of the methods and the essentials of their
derivation, several examples (13 in all) are developed to illustrate their imple-
mentation. Chapter 4 covers the basics of the Bayesian approach, mostly through
examples. The first set of examples illustrate the type of inferences that are possible
(joint, conditional and marginal inferences), when the posterior distributions have
known closed forms. In this case, inferences can be exact using analytical methods,
or can be approximated using Monte Carlo draws from the posterior distribution. A
number of options are available when the posterior distribution is only known up
to proportionality. After a very brief account of Bayesian asymptotics, the chapter
focuses on Markov chain Monte Carlo (McMC) methods. These are recipes for
generating approximate draws from posterior distributions. Using these draws, one
can obtain Monte Carlo estimates of the complete posterior distribution, or Monte
Carlo estimates of summaries such as the mean, variance and posterior intervals. The
chapter provides a description of the Gibbs sampling algorithm and of the joint and
single-site updating of parameters based on the Metropolis-Hastings algorithm. An
overview of the tools needed for analysis of the McMC output concludes the chapter.
An appendix provides the mathematical details underlying the magic of McMC
within the constraints imposed by the author’s limited mathematics. Chapter 5
illustrates applications of McMC. Several of the examples discussed in connection
with Newton-Raphson and the EM algorithm are revisited and implemented from a
Bayesian McMC perspective.
Part II of the book has the heading Prediction. The boundaries between Parts I
and II should not be construed as rigid. However, the heading emphasises the main
thread of Chaps. 6–11, with an important detour in Chap. 8 that discusses mul-
tiple testing. Chapter 6 introduces many important ingredients of prediction: best
predictor, best linear predictor, overfitting, bias-variance trade-off, cross-validation.
Among the topics discussed is the accuracy with which future observations can be
predicted, how is this accuracy measured, the factors affecting it and importantly,
Preface ix

how a measure of uncertainty can be attached to accuracy. The body of the chapter
deals with prediction from a classical/frequentist perspective. Bayesian prediction
is illustrated in several examples throughout the book and particularly in Chap. 10.
In Chap. 6, many important ideas related to prediction are illustrated using a simple
least-squares setting, where the number of records n is larger than the number of
parameters p of the model; this is the .n > p setup. However, in many modern
genetic problems, the number of parameters greatly exceeds the number of records;
the .p n setup. This calls for some form of regularisation, a topic introduced
in Chap. 7 under the heading Shrinkage Methods. After an introduction to ridge
regression, the chapter provides a description of the lasso (least absolute shrinkage
and selection operator) and of a Bayesian spike and slab model. The spike and
slab model can be used for both prediction and for discovery of relevant covariates
that have an effect on the records. In a genetic context, these covariates could be
observed genetic markers and the challenge is how to find as many promising mark-
ers among the hundreds of thousands available, while incurring a low proportion
of false positives. This leads to the topic reviewed in Chap. 8: False Discovery
Rate. The subject is first presented from a frequentist perspective as introduced
by Benjamini and Hochberg in their highly acclaimed work, and is also discussed
using empirical Bayesian and fully Bayesian approaches. The latter is implemented
within an McMC environment using the spike and slab model as driving engine.
The complete marginal posterior distribution of the false discovery rate can be
obtained as a by-product of the McMC algorithm. Chapter 9 describes some of
the technical details associated with prediction for binary data. The topics discussed
include logistic regression for the analysis of case-control studies, where the data are
collected in a non-random fashion, penalised logistic regression, lasso and spike and
slab models implemented for the analysis of binary records, area under the curve
(AUC) and prediction of a genetic disease of an individual, given information on
the disease status of its parents. The chapter concludes with an appendix providing
technical details for an approximate analysis of binary traits. The approximation
can be useful as a first step, before launching the full McMC machinery of a more
formal approach. Chapter 10 deals with Bayesian prediction, where many of the
ideas scattered in various parts of the book are brought into focus. The chapter
discusses the sources of uncertainty of predictors from a Bayesian and frequentist
perspective and how they affect accuracy of prediction as measured by the Bayesian
and frequentist expectations of the sample mean squared error of prediction. The
final part of the chapter introduces, via an example, how specific aspects of a
Bayesian model can be tested using posterior predictive simulations, a topic that
combines frequentist and Bayesian ideas. Chapter 11 completes Part II and provides
an overview of selected nonparametric methods. After an introduction of traditional
nonparametric models, such as the binned estimator and kernel smoothing methods,
the chapter concentrates on four more recent approaches: kernel methods using basis
expansions, neural networks, classification and regression trees, and bagging and
random forests.
Part III of the book consists of exercises and their solutions. The exercises
(Chap. 12) are designed to provide the reader with deeper insight of the subject
x Preface

discussed in the body of the book. A complete set of solutions, many involving
programming, is available in Chap. 13.
The majority of the datasets used in the book are simulated and intend to illustrate
important features of real-life data. The size of the simulated data is kept within the
limits necessary to obtain solutions in reasonable CPU time, using straightforward
R-code, although the reader may modify size by changing input parameters.
Advanced computational techniques required for the analysis of very large datasets
are not addressed. This subject requires a specialised treatment beyond the scope of
this book.
The book has not had the benefit of having been used as material in repeated
courses by a critical mass of students, who invariably stimulate new ideas, help with
a deeper understanding of old ones and, not least, spot errors in the manuscript and
in the problem sections. Despite these shortcomings, the book is completed and out
of my hands. I hope the critical reader will make me aware of the errors. These
will be corrected and listed on the web at https://github.com/SorensenD/SLGDS.
The GitHub site also contains most of the R-codes used in the book, which can be
downloaded, as well as notes that include comments, clarifications or additions of
themes discussed in the book.

Aarhus, Denmark Daniel Sorensen

May 2023
Acknowledgements

Many friends and colleagues have assisted in a variety of ways. Bernt Guldbrandtsen
(University of Copenhagen) has been a stable helping hand and helping mind.
Bernt has generously shared his deep biological and statistical knowledge with
me on many, many occasions, and provided also endless advice with LaTeX and
MarkDown issues, with programming details, always with good spirits and patience.
I owe much to him. Ole Fredslund Christensen (Aarhus University) read several
chapters and wrote a meticulous list of corrections and suggestions. I am very
grateful to him for this effort. Gustavo de los Campos (Michigan State University)
has shared software codes and tricks and contributed with insight in many parts
of the book, particularly in Prediction and Kernel Methods. I have learned much
during the years of our collaboration. Parts of the book were read by Andres Legarra
(INRA), Miguel Pérez Enciso (University of Barcelona), Bruce Walsh (University
of Arizona), Rasmus Waagepetersen (Aalborg University), Peter Sørensen (Aarhus
University), Kenneth Enevoldsen (Aarhus University), Agustín Blasco (Universidad
Politécnica de Valencia), Jens Ledet Jensen (Aarhus University), Fabio Morgante
(Clemson University), Doug Speed (Aarhus University), Bruce Weir (University
of Washington), Rohan Fernando (retired from Iowa State University) and Daniel
Gianola (retired from the University of Wisconsin-Madison). I received many
helpful comments, suggestions and corrections from them. However, I am the only
responsible for the errors that escaped scrutiny. I would be thankful if I could be
made aware of these errors.
I acknowledge Eva Hiripi, Senior Editor, Statistics Books, Springer, for consis-
tent support during this project.
I am the grateful recipient of many gifts from my wife Pia. One has been essential
for concentrating on my task: happiness.

xi
Contents

1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 The Sampling Distribution of a Random Variable . . . . . . . . . . . . . . . . . . 3
1.3 The Likelihood and the Maximum Likelihood Estimator . . . . . . . . . . 5
1.4 Incorporating Prior Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Frequentist or Bayesian? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.6 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.7 Appendix: A Short Overview of Quantitative Genomics. . . . . . . . . . . 32

Part I Fitting Likelihood and Bayesian Models

2 Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.1 A Little Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.2 Summary of Likelihood Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.3 Example: The Likelihood Function of Transformed Data. . . . . . . . . . 59
2.4 Example: Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.5 Example: Bivariate Normal Model with Missing Records . . . . . . . . . 63
2.6 Example: Likelihood Inferences Using Selected Records. . . . . . . . . . 66
2.7 Example: The Likelihood Function with Truncated Data . . . . . . . . . . 71
2.8 Example: The Likelihood Function of a Genomic Model . . . . . . . . . . 72
3 Computing the Likelihood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.1 Newton-Raphson and the Method of Scoring . . . . . . . . . . . . . . . . . . . . . . . 77
3.2 Gradient Descent and Stochastic Gradient Descent . . . . . . . . . . . . . . . . 98
3.3 The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4 Bayesian Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
4.1 Example: Estimating the Mean and Variance of a Normal
Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
4.2 Posterior Predictive Distribution for a New Observation . . . . . . . . . . . 151
4.3 Example: Monte Carlo Inferences of the Joint Posterior
Distribution of Mean and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

xiii
xiv Contents

4.4 Approximating a Marginal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

4.5 Example: The Normal Linear Mixed Model . . . . . . . . . . . . . . . . . . . . . . . . 156
4.6 Example: Inferring a Variance Component from a
Marginal Posterior Distribution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
4.7 Example: Bayesian Learning—Inheritance of Haemophilia . . . . . . . 162
4.8 Example: Bayesian Learning—Updating Additive
Genetic Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.9 A Brief Account of Bayesian Asymptotics . . . . . . . . . . . . . . . . . . . . . . . . . 170
4.10 An Overview of Markov Chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . 171
4.11 The Metropolis-Hastings Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
4.12 The Gibbs Sampling Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.13 Output Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.14 Appendix: A Closer Look at the McMC Machinery . . . . . . . . . . . . . . . 194
5 McMC in Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.1 Example: Estimation of Gene Frequencies from ABO
Blood Group Phenotypes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.2 Example: A Regression Model for Binary Data . . . . . . . . . . . . . . . . . . . . 213
5.3 Example: A Regression Model for Correlated Binary Data . . . . . . . . 220
5.4 Example: A Genomic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
5.5 Example: A Mixture Model of Two Gaussian Components . . . . . . . 234
5.6 Example: An Application of the EM Algorithm
in a Bayesian Context—Estimation of SNP Effects . . . . . . . . . . . . . . . . 239
5.7 Example: Bayesian Analysis of the Truncated Normal Model. . . . . 244
5.8 A Digression on Model Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

Part II Prediction
6 Fundamentals of Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
6.1 Best Predictor and Best Linear Predictor. . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
6.2 Estimating the Regression Function in Practice: Least Squares . . . 263
6.3 Overview of Things to Come . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
6.4 The Bias-Variance Trade-Off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
6.5 Estimation of Validation MSE of Prediction in Practice . . . . . . . . . . . 280
6.6 On Average Training MSE Underestimates Validation MSE . . . . . . 284
6.7 Least Squares Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
7 Shrinkage Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
7.1 Ridge Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
7.2 The Lasso . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
7.3 An Extension of the Lasso: The Elastic Net . . . . . . . . . . . . . . . . . . . . . . . . 319
7.4 Example: Prediction Using Ridge Regression and Lasso . . . . . . . . . . 319
7.5 A Bayesian Spike and Slab Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
8 Digression on Multiple Testing: False Discovery Rates. . . . . . . . . . . . . . . . . 333
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
8.2 Preliminaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
Contents xv

8.3 The Benjamini-Hochberg False Discovery Rate . . . . . . . . . . . . . . . . . . . . 338

8.4 A Bayesian Approach for a Simple Two-Group Mixture
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
8.5 Empirical Bayes Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
8.6 Local False Discovery Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347
8.7 Storey’s q-Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
8.8 Fully Bayesian McMC False Discovery Rate . . . . . . . . . . . . . . . . . . . . . . . 350
8.9 Example: A Two-Component Gaussian Mixture . . . . . . . . . . . . . . . . . . . 352
8.10 Example: The Spike and Slab Model with Genetic Markers . . . . . . . 361
9 Binary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
9.1 Prediction for Binary Observations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
9.2 Mean Squared Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
9.3 Logistic Regression with Non-random Sampling. . . . . . . . . . . . . . . . . . . 375
9.4 Penalised Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
9.5 The Lasso with Binary Records . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379
9.6 A Bayesian Spike and Slab Model for Binary Records . . . . . . . . . . . . 380
9.7 Area Under the Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
9.8 Prediction of Disease Status of Individual Given Disease
Status of relatives. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
9.9 Appendix: Approximate Analysis of Binary Traits . . . . . . . . . . . . . . . . . 411
10 Bayesian Prediction and Model Checking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
10.1 Levels of Uncertainty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
10.2 Prior and Posterior Predictive Distributions. . . . . . . . . . . . . . . . . . . . . . . . . 419
10.3 Bayesian Expectations of MSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
10.4 Example: Bayesian and Frequentist Measures of Uncertainty . . . . . 430
10.5 Model Checking Using Posterior Predictive Distributions . . . . . . . . . 435
11 Nonparametric Methods: A Selected Overview . . . . . . . . . . . . . . . . . . . . . . . . . 445
11.1 Local Kernel Smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
11.2 Kernel Methods Using Basis Expansions . . . . . . . . . . . . . . . . . . . . . . . . . . . 460
11.3 Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
11.4 Classification and Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
11.5 Bagging and Random Forests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521
11.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533

Part III Exercises and Solutions

12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
12.1 Likelihood Exercises I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
12.2 Likelihood Exercises II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 549
12.3 Bayes Exercises I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
12.4 Bayes Exercises II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
12.5 Prediction Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
xvi Contents

13 Solution to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575

13.1 Likelihood Exercises I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
13.2 Likelihood Exercises II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
13.3 Bayes Exercises I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614
13.4 Bayes Exercises II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
13.5 Prediction Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 651

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675
Author Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683
Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
Chapter 1
Overview

1.1 Introduction

Suppose there is a set of data consisting of observations in humans on forced

expiratory volume (FEV, a measure of lung function; lung function is a predictor
of health and a low lung function is a risk factor for mortality), or on the presence or
absence of heart disease and that there are questions that could be answered using
these data. For example, a statistical geneticist may wish to know:
1. Is there a genetic component contributing to the total variance of these traits?
A positive answer suggests that genetic factors are at play. The next step would
be to investigate the following:
2. Is the genetic component of the traits driven by a few genes located on
a particular chromosome, or are there many genes scattered across many
chromosomes? How many genes are involved and is this a scientifically sensible
question?
3. Are the genes detected protein-coding genes, or are there also noncoding genes
involved in gene regulation?
4. How is the strength of the signals captured in a statistical analysis related to the
two types of genes? What fraction of the total genetic variation is allocated to
both types of genes?
5. What are the frequencies of the genes in the sample? Are the frequencies
associated with the magnitude of their effects on the traits?
6. What is the mode of action of the genes?
7. What proportion of the genetic variance estimated in 1 can be explained by the
discovered genes?
8. Given the information on the set of genes carried by an individual, will a
genetic score constructed before observing the trait help with early diagnosis
and prevention?
9. How should the predictive ability of the score be measured?

D. Sorensen, Statistical Learning in Genetics, Statistics for Biology and Health,
https://doi.org/10.1007/978-3-031-35851-7_1
2 1 Overview

10. Are there other non-genetic factors that affect the traits, such as smoking
behaviour, alcohol consumption, blood pressure measurements, body mass
index and level of physical exercise?
11. Could the predictive ability of the genetic score be improved by incorporation
of these non-genetic sources of information, either additively or considering
interactions? What is the relative contribution from the different sources of
information?
The first question has been the focus of quantitative genetics during many
years long before the so-called genomic revolution, that is, before breakthroughs
in molecular biology made technically and economically possible the sequencing of
whole genomes, resulting in hundreds of thousands or millions of genetic markers
(single nucleotide polymorphisms (SNPs)) for each individual in the data set. Until
the end of the twentieth century before dense genetic marker data were available,
genetic variation of a given trait was inferred using resemblance between relatives.
This requires equating the expected proportion of genotypes shared identical
by descent, given a pedigree, with the observed phenotypic correlation between
relatives. The fitted models also retrieve “estimates of random effects”, the predicted
genetic values that act as genetic scores and are used in selection programs of farm
animals and plants.
Answers to questions .2 − 7 would provide insight into genetic architecture and
thereby, into the roots of many complex traits and diseases. This has important
practical implications for drug therapies targeted to particular metabolic pathways,
for personalised medicine and for improved prediction. These questions could not
be sensibly addressed before dense marker data became available (perhaps with
the exception provided by complex segregation analysis that allowed searching for
single genes).
Shortly after a timid start where use of low-density genetic marker information
made its appearance, the first decade of the twenty-first century saw the construction
of large biomedical databases that could be accessed for research purposes where
health information was collected. One such database was the British .1958−cohort
study including medical records from approximately 3000 individuals genotyped
for one million SNPs. These data provided for the first time the opportunity to begin
addressing questions .2 − 7. However, a problem had to be faced: how to fit and
validate a model with one million unknowns to a few thousand records and how to
find a few promising genetic markers from the million available avoiding a large
proportion of false positives? This resulted in a burst of activity in the fields of
computer science and statistics, leading to development of a methodology designed
to meet the challenges posed by Big Data.
In recent years, the amount of information in modern data sets has
grown and become formidable and the challenges have not diminished. One
example is the UK Biobank that provides a wealth of health information
from half a million UK participants. The database is regularly updated and
a team of scientists recently reported that the complete exome sequence was
completed (about .2% of the genome involved in coding for proteins and
1.2 The Sampling Distribution of a Random Variable 3

considered to be important for identifying disease-causing or rare genetic

variants). The study involved more than .150,000 individuals genotyped
for more than 500 million SNPs (Halldorsson et al 2022). These data are
paired with detailed medical information and constitute an unparalleled
resource for linking human genetic variation to human biology and dis-
ease.
An important task for the statistical geneticist is to adapt, develop and implement
models that can extract information from these large-scale data and to contribute to
finding answers to the 11 questions posed above. This is an exercise on inference
(such as estimation of genetic variation), on gene detection (among the millions
of genetic markers that may be included in a probability model, how to screen
the “relevant” ones for further study?), on prediction (how does the quality of
prediction of future records, for example, outcome of a disease, improve with this
new knowledge about the trait?) and on how to fit the probability models. There are
several areas of expertise that must be developed in order to fulfil this data-driven
research task. An initial step is to understand the methodology that underlies the
probability models and to learn the modern computer-intensive methods required
for fitting these models. The objective of this book is to guide the reader to take this
first step.
This opening chapter gives an overview of the book’s content, omitting many
technicalities that are revealed in later chapters, and is intended to give a flavour
of the way ahead. The first part is about methodology and introduces, by means of
an example, the concepts of probability distribution, likelihood and the maximum
likelihood estimator. This is followed by a brief description of Bayesian methods
indicating how prior knowledge can be incorporated in a probability model and
how it can affect inferences. The second part of the chapter presents models
for prediction and for detection of genes using parametric and nonparametric
approaches. There is an appendix that offers a brief tour of the quantitative
genetic/genomic model. The goal is to introduce the jargon and the basic quanti-
tative genetic/genomic concepts used in the book.

1.2 The Sampling Distribution of a Random Variable

A useful starting point is to establish the distinction between a probability distribu-

tion and a likelihood function. For example, assume a random variable X that has a
Bernoulli probability distribution. This random variable can take 1 or 0 as possible
values (more generally, it can have two modalities) with probabilities .θ and .1 − θ ,
respectively. The mean of the distribution is

E(X|θ ) = 0 × Pr(X = 0|θ ) + 1 × Pr(X = 1|θ ) = θ

Expect The Unexpected PDF
No ratings yet
Expect The Unexpected PDF
315 pages
Lesson Plan in Biology-Genetics
80% (5)
Lesson Plan in Biology-Genetics
5 pages
Advanced Data Analysis in Neuroscience Integrating Statistical and Computational Models PDF
No ratings yet
Advanced Data Analysis in Neuroscience Integrating Statistical and Computational Models PDF
308 pages
Chance Encounters: A First Course in Data Analysis and Inference.
No ratings yet
Chance Encounters: A First Course in Data Analysis and Inference.
3 pages
An Introduction to Information Theory
From Everand
An Introduction to Information Theory
Fazlollah M. Reza
No ratings yet
An Introduction to Statistical Computing: A Simulation-based Approach
From Everand
An Introduction to Statistical Computing: A Simulation-based Approach
Jochen Voss
No ratings yet
3 Chapter - Test - B PDF
No ratings yet
3 Chapter - Test - B PDF
4 pages
Bayesian Methods in Structural Bioinformatics
No ratings yet
Bayesian Methods in Structural Bioinformatics
398 pages
NRMCB Review Accepted forRPS
No ratings yet
NRMCB Review Accepted forRPS
41 pages
Biometry: The Principles and Practices of Statistics in Biological Research. ISBN 0716724111, 978-0716724117
100% (23)
Biometry: The Principles and Practices of Statistics in Biological Research. ISBN 0716724111, 978-0716724117
23 pages
Introduction To Bios Tatis Tic S Second
No ratings yet
Introduction To Bios Tatis Tic S Second
374 pages
Computational Bayesian Statistics. An Introduction - Amaral, Paulino, Muller PDF
100% (3)
Computational Bayesian Statistics. An Introduction - Amaral, Paulino, Muller PDF
257 pages
Sokal y Rohlf Bioestadistica
67% (3)
Sokal y Rohlf Bioestadistica
374 pages
Biostatistics: Written by - Alomgir Hossain
No ratings yet
Biostatistics: Written by - Alomgir Hossain
7 pages
Bayesian Methodology: an Overview With The Help Of R Software
From Everand
Bayesian Methodology: an Overview With The Help Of R Software
Editor IJSMI
No ratings yet
Approximate Bayesian Computation (ABC) in Practice
No ratings yet
Approximate Bayesian Computation (ABC) in Practice
9 pages
SPSS for Applied Sciences: Basic Statistical Testing
From Everand
SPSS for Applied Sciences: Basic Statistical Testing
Cole Davis
2.5/5 (6)
Abc PDF
No ratings yet
Abc PDF
9 pages
Bayesian Phylogenetics Methods Algorithms and Applications 1st Edition Ming-Hui Chen - The special ebook edition is available for download now
100% (3)
Bayesian Phylogenetics Methods Algorithms and Applications 1st Edition Ming-Hui Chen - The special ebook edition is available for download now
56 pages
An Introduction to Bayesian Statistics Without Using Equations - Eguchi
No ratings yet
An Introduction to Bayesian Statistics Without Using Equations - Eguchi
32 pages
Evolutionary Genomics Statistical and Computational Methods Volume 1 1st Edition Aidan Budd (Auth.) instant download
100% (1)
Evolutionary Genomics Statistical and Computational Methods Volume 1 1st Edition Aidan Budd (Auth.) instant download
49 pages
Download full Expect The Unexpected A First Course In Biostatistics Second Edition Raluca Balan ebook all chapters
100% (1)
Download full Expect The Unexpected A First Course In Biostatistics Second Edition Raluca Balan ebook all chapters
90 pages
Expect The Unexpected A First Course In Biostatistics Second Edition Raluca Balan - The latest updated ebook is now available for download
100% (1)
Expect The Unexpected A First Course In Biostatistics Second Edition Raluca Balan - The latest updated ebook is now available for download
73 pages
Multivariate Analysis for the Biobehavioral and Social Sciences: A Graphical Approach
From Everand
Multivariate Analysis for the Biobehavioral and Social Sciences: A Graphical Approach
Bruce L. Brown
No ratings yet
Seefeld-Statistics Using R With Biological Examples PDF
No ratings yet
Seefeld-Statistics Using R With Biological Examples PDF
325 pages
CLASS - 1 General Research Methodology - Research.
No ratings yet
CLASS - 1 General Research Methodology - Research.
4 pages
Yau S. Mathematical Principles in Bioinformatics 2024
No ratings yet
Yau S. Mathematical Principles in Bioinformatics 2024
177 pages
1
No ratings yet
1
18 pages
Heterogeneity in Statistical Genetics: Derek Gordon Stephen J. Finch Wonkuk Kim
No ratings yet
Heterogeneity in Statistical Genetics: Derek Gordon Stephen J. Finch Wonkuk Kim
366 pages
Introduction To Probability Simulation and Gibbs Sampling With R
No ratings yet
Introduction To Probability Simulation and Gibbs Sampling With R
322 pages
Peter McCullagh - Ten Projects in Applied Statistics-2023
No ratings yet
Peter McCullagh - Ten Projects in Applied Statistics-2023
415 pages
Corporate Governance
No ratings yet
Corporate Governance
3 pages
Evolutionary Genomics Statistical and Computational Methods Volume 2 1st Edition Christian N. K. Anderson 2024 Scribd Download
100% (9)
Evolutionary Genomics Statistical and Computational Methods Volume 2 1st Edition Christian N. K. Anderson 2024 Scribd Download
85 pages
Biostatistics Series Module 1: Basics of Biostatistics: Resumen
No ratings yet
Biostatistics Series Module 1: Basics of Biostatistics: Resumen
27 pages
Modeling in Computational Biology and Biomedicine:: A Multidisciplinary Endeavor Draft (April 2012)
No ratings yet
Modeling in Computational Biology and Biomedicine:: A Multidisciplinary Endeavor Draft (April 2012)
31 pages
Statistical Human Genetics Methods and Protocols 2nd Edition Robert C. Elston (Eds.) pdf download
100% (1)
Statistical Human Genetics Methods and Protocols 2nd Edition Robert C. Elston (Eds.) pdf download
54 pages
2019 Book EvolutionaryGenomics
No ratings yet
2019 Book EvolutionaryGenomics
777 pages
Bayesian Approach For Animal Breeding Data Analysis
50% (2)
Bayesian Approach For Animal Breeding Data Analysis
42 pages
An Introduction to Bayesian Inference, Methods and Computation Nick Heard download
100% (1)
An Introduction to Bayesian Inference, Methods and Computation Nick Heard download
47 pages
Support Vector Machine Classification of Microarray Gene Expression Data UCSC-CRL-99-09
No ratings yet
Support Vector Machine Classification of Microarray Gene Expression Data UCSC-CRL-99-09
31 pages
(Ebook) Introduction to Bayesian Statistics by William M. Bolstad, James M. Curran ISBN 9781118091562, 1118091566 - Download the ebook now and read anytime, anywhere
100% (1)
(Ebook) Introduction to Bayesian Statistics by William M. Bolstad, James M. Curran ISBN 9781118091562, 1118091566 - Download the ebook now and read anytime, anywhere
57 pages
Statistical Population Genomics 1st Edition Julien Y. Dutheil - The ebook is now available, just one click to start reading
No ratings yet
Statistical Population Genomics 1st Edition Julien Y. Dutheil - The ebook is now available, just one click to start reading
82 pages
ABS 1 Exercises 2024
No ratings yet
ABS 1 Exercises 2024
11 pages
Bayesian Econometric Methods Gary Koop - Download the entire ebook instantly and explore every detail
100% (1)
Bayesian Econometric Methods Gary Koop - Download the entire ebook instantly and explore every detail
56 pages
Statistical learning for biomedical data 1st Edition James D Malley pdf download
100% (2)
Statistical learning for biomedical data 1st Edition James D Malley pdf download
56 pages
(eBook PDF) Biostatistics: A Foundation for Analysis in the Health Sciences, 11th Editionpdf download
100% (2)
(eBook PDF) Biostatistics: A Foundation for Analysis in the Health Sciences, 11th Editionpdf download
47 pages
High-Dimensional Covariance Estimation: With High-Dimensional Data
From Everand
High-Dimensional Covariance Estimation: With High-Dimensional Data
Mohsen Pourahmadi
No ratings yet
Computational Bayesian Statistics
100% (1)
Computational Bayesian Statistics
254 pages
Chapter 2. Sparse Bayesian Learning
No ratings yet
Chapter 2. Sparse Bayesian Learning
19 pages
1.1BIO 121 Thinking Like A Geneticist
No ratings yet
1.1BIO 121 Thinking Like A Geneticist
14 pages
Biostatistics Concepts and Applications For Biologists
No ratings yet
Biostatistics Concepts and Applications For Biologists
210 pages
Introduction To Bio Statistics 2nd Edition R. Sokal F. Rohlf Statistics Biology
100% (6)
Introduction To Bio Statistics 2nd Edition R. Sokal F. Rohlf Statistics Biology
190 pages
Stochastic Modelling for Systems Biology Chapman Hall CRC Mathematical Computational Biology 1st Edition Darren J. Wilkinson instant download
100% (1)
Stochastic Modelling for Systems Biology Chapman Hall CRC Mathematical Computational Biology 1st Edition Darren J. Wilkinson instant download
49 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
Evolutionary Genomics Statistical and Computational Methods Volume 1 1st Edition Aidan Budd (Auth.) - Download the full ebook now to never miss any detail
100% (1)
Evolutionary Genomics Statistical and Computational Methods Volume 1 1st Edition Aidan Budd (Auth.) - Download the full ebook now to never miss any detail
82 pages
Acta Scientiarum: Genomicland: Software For Genome-Wide Association Studies and Genomic Prediction
No ratings yet
Acta Scientiarum: Genomicland: Software For Genome-Wide Association Studies and Genomic Prediction
7 pages
Instant Download Evolutionary Genomics Statistical and Computational Methods Volume 2 1st Edition Christian N. K. Anderson PDF All Chapters
100% (1)
Instant Download Evolutionary Genomics Statistical and Computational Methods Volume 2 1st Edition Christian N. K. Anderson PDF All Chapters
67 pages
Learning Probabilistic Graphical Models in R
From Everand
Learning Probabilistic Graphical Models in R
David Bellot
No ratings yet
PDF Statistical learning for biomedical data 1st Edition James D Malley download
100% (1)
PDF Statistical learning for biomedical data 1st Edition James D Malley download
67 pages
Multivariate Statistical Machine Learning Methods For Genomic Prediction
No ratings yet
Multivariate Statistical Machine Learning Methods For Genomic Prediction
707 pages
Non Parametrical Statics Biological With R PDF
No ratings yet
Non Parametrical Statics Biological With R PDF
341 pages
Instant Access To Mathematical Models of Plant Herbivore Interactions 1st Edition Deangelis Ebook Full Chapters
100% (3)
Instant Access To Mathematical Models of Plant Herbivore Interactions 1st Edition Deangelis Ebook Full Chapters
51 pages
High Quality Yoga for Children 200+ Yoga Poses, Breathing Exercises, and Meditations for Healthier, Happier, More Resilient Children Google Drive Download
100% (8)
High Quality Yoga for Children 200+ Yoga Poses, Breathing Exercises, and Meditations for Healthier, Happier, More Resilient Children Google Drive Download
15 pages
Must Own How Not to Die Discover the Foods Scientifically Proven to Prevent and Reverse Disease Accessible PDF Download
No ratings yet
Must Own How Not to Die Discover the Foods Scientifically Proven to Prevent and Reverse Disease Accessible PDF Download
22 pages
Free Download Child's Mind Mindfulness Practices to Help Our Children Be More Focused, Calm, and Relaxed Full MOBI eBook
100% (7)
Free Download Child's Mind Mindfulness Practices to Help Our Children Be More Focused, Calm, and Relaxed Full MOBI eBook
15 pages
Biomedical Visualisation Volume 13 The Art, Philosophy and Science of Observation and Imaging Premium Download
100% (2)
Biomedical Visualisation Volume 13 The Art, Philosophy and Science of Observation and Imaging Premium Download
14 pages
An Introduction to Gerontology 1st Edition Complete Chapter Download
100% (11)
An Introduction to Gerontology 1st Edition Complete Chapter Download
15 pages
View Finding My Voice Kids with Speech Impairment Total Access eBook
100% (5)
View Finding My Voice Kids with Speech Impairment Total Access eBook
24 pages
(PDF Ebook) Bobby's Book Full Access Download
100% (11)
(PDF Ebook) Bobby's Book Full Access Download
21 pages
Top Choice In the Midst of Bi Polar Disorder You'll Find a Praise High-Quality Download
100% (9)
Top Choice In the Midst of Bi Polar Disorder You'll Find a Praise High-Quality Download
21 pages
Top Choice The Drinking Game All Format Download
100% (12)
Top Choice The Drinking Game All Format Download
23 pages
Learning How to Learn, 1st Edition Optimized DOCX Download
100% (13)
Learning How to Learn, 1st Edition Optimized DOCX Download
16 pages
(PDF eBook) Drug Product Development for the Back of the Eye Instant Access
100% (12)
(PDF eBook) Drug Product Development for the Back of the Eye Instant Access
19 pages
Atlas of Neuromuscular Diseases A Practical Guideline Full Version Download
100% (6)
Atlas of Neuromuscular Diseases A Practical Guideline Full Version Download
17 pages
ABCS OF CANCER, THE SEPARATING THE FACTS FROM THE MYTHS Separating the Facts from the Myths Instant Download
100% (3)
ABCS OF CANCER, THE SEPARATING THE FACTS FROM THE MYTHS Separating the Facts from the Myths Instant Download
16 pages
HIV/AIDS and Older Adults Challenges for Individuals, Families, and Communities, 1st Edition all chapter
100% (9)
HIV/AIDS and Older Adults Challenges for Individuals, Families, and Communities, 1st Edition all chapter
17 pages
Endocarditis Diagnosis and Management Premium Download
100% (6)
Endocarditis Diagnosis and Management Premium Download
16 pages
Case Files Neurology, Third Edition 3rd Edition Instant Download
100% (5)
Case Files Neurology, Third Edition 3rd Edition Instant Download
16 pages
Textbook of Atopic Dermatitis - 1st Edition Study Guide Download
100% (9)
Textbook of Atopic Dermatitis - 1st Edition Study Guide Download
16 pages
Neutrophilic Dermatoses Full eBook Access
100% (6)
Neutrophilic Dermatoses Full eBook Access
17 pages
Management of Musculoskeletal Injuries in the Trauma Patient Multiformat Download
100% (8)
Management of Musculoskeletal Injuries in the Trauma Patient Multiformat Download
17 pages
Neurology An Evidence Based Approach 1st Edition Instant PDF Download
100% (5)
Neurology An Evidence Based Approach 1st Edition Instant PDF Download
17 pages
Applied Optics Fundamentals and Device Applications Nano, MOEMS, and Biotechnology - 1st Edition Full Digital Edition
100% (7)
Applied Optics Fundamentals and Device Applications Nano, MOEMS, and Biotechnology - 1st Edition Full Digital Edition
14 pages
Palladium Assisted Synthesis of Heterocycles 1st Edition All-in-One Download
100% (7)
Palladium Assisted Synthesis of Heterocycles 1st Edition All-in-One Download
15 pages
The Circadian Clock Readable Ebook Download
100% (5)
The Circadian Clock Readable Ebook Download
14 pages
Cardiovascular Imaging and Image Analysis 1st Edition Full Text PDF
100% (5)
Cardiovascular Imaging and Image Analysis 1st Edition Full Text PDF
15 pages
X Ray Imaging Systems for Biomedical Engineering Technology An Essential Guide One-Click eBook Download
100% (7)
X Ray Imaging Systems for Biomedical Engineering Technology An Essential Guide One-Click eBook Download
16 pages
Nutritional and Integrative Strategies in Cardiovascular Medicine, 2nd Edition PDF DOCX DOWNLOAD
100% (5)
Nutritional and Integrative Strategies in Cardiovascular Medicine, 2nd Edition PDF DOCX DOWNLOAD
15 pages
Integrated Nano Biomechanics Full Access Download
100% (6)
Integrated Nano Biomechanics Full Access Download
14 pages
Relax into Yoga for Chronic Pain An Eight Week Mindful Yoga Workbook for Finding Relief and Resilience Digital DOCX Download
100% (4)
Relax into Yoga for Chronic Pain An Eight Week Mindful Yoga Workbook for Finding Relief and Resilience Digital DOCX Download
14 pages
Wound Healing Biomaterials Volume 1 Therapies and Regeneration Entire PDF eBook
100% (4)
Wound Healing Biomaterials Volume 1 Therapies and Regeneration Entire PDF eBook
14 pages
The Psychology of Learning and Motivation Full Book Download
100% (4)
The Psychology of Learning and Motivation Full Book Download
14 pages
Inheritance paper 4
No ratings yet
Inheritance paper 4
49 pages
Science Sat Words - Vocabulary List
No ratings yet
Science Sat Words - Vocabulary List
52 pages
Colour Mutations in Genus Agapornis
No ratings yet
Colour Mutations in Genus Agapornis
2 pages
Test Bank For Genetics: A Conceptual Approach Sixth Edition Download PDF
100% (8)
Test Bank For Genetics: A Conceptual Approach Sixth Edition Download PDF
43 pages
Causes of Great Depression Essay
100% (2)
Causes of Great Depression Essay
4 pages
pYAC-4 Neo, A Yeast Artificial Chromosome Vector Which Codes For G418 Resistance in Mammalian Cells
No ratings yet
pYAC-4 Neo, A Yeast Artificial Chromosome Vector Which Codes For G418 Resistance in Mammalian Cells
2 pages
Non Mendalian Patterns of Inheritance
No ratings yet
Non Mendalian Patterns of Inheritance
33 pages
Part II Zoology Booklet 2020-21-1
No ratings yet
Part II Zoology Booklet 2020-21-1
28 pages
Pedigree Review Worksheet
100% (1)
Pedigree Review Worksheet
4 pages
Artificial DNA Methods and Applications 1st Edition Yury E. Khudyakov (Editor) All Chapters Instant Download
100% (11)
Artificial DNA Methods and Applications 1st Edition Yury E. Khudyakov (Editor) All Chapters Instant Download
75 pages
Brasil HRM
No ratings yet
Brasil HRM
8 pages
BIOL1364 Tutorial 2 Meiosis, Mendelian Ratios and Probability 2024
No ratings yet
BIOL1364 Tutorial 2 Meiosis, Mendelian Ratios and Probability 2024
2 pages
5e Lesson Plan DN Drexel
No ratings yet
5e Lesson Plan DN Drexel
4 pages
Alok 12
No ratings yet
Alok 12
16 pages
SLP PCR
No ratings yet
SLP PCR
3 pages
Dna T&T Worksheet Answer Key: Tein 1 2
No ratings yet
Dna T&T Worksheet Answer Key: Tein 1 2
2 pages
Punnett Square 1
No ratings yet
Punnett Square 1
16 pages
Lesson Plan (Gene Mutation) Docx
No ratings yet
Lesson Plan (Gene Mutation) Docx
3 pages
Bachelor of Science (B.SC.) Biotechnology Semester-Wise Syllabus in CBCS Pattern
No ratings yet
Bachelor of Science (B.SC.) Biotechnology Semester-Wise Syllabus in CBCS Pattern
24 pages
Earth and Life Science: (Quarter 2-Module 4/lesson 4/ Week 4) Genetic Engineering
100% (6)
Earth and Life Science: (Quarter 2-Module 4/lesson 4/ Week 4) Genetic Engineering
20 pages
Mitosis Meiosis Study Guide 2 Answers
No ratings yet
Mitosis Meiosis Study Guide 2 Answers
3 pages
2017 HSC Biology
No ratings yet
2017 HSC Biology
32 pages
Insect Genomic Resources: Status, Availability and Future: General Articles
No ratings yet
Insect Genomic Resources: Status, Availability and Future: General Articles
10 pages
Polytene Chromosomes
No ratings yet
Polytene Chromosomes
6 pages
Recombinant DNA Technology 1st Edition Keya Chaudhuri - Own the ebook now with all fully detailed content
100% (2)
Recombinant DNA Technology 1st Edition Keya Chaudhuri - Own the ebook now with all fully detailed content
47 pages
Biom1070 L1 2022
No ratings yet
Biom1070 L1 2022
34 pages
Wilkins 2011 How Many Species Concepts Are There
No ratings yet
Wilkins 2011 How Many Species Concepts Are There
3 pages
From Gene To Protein - Worksheet
No ratings yet
From Gene To Protein - Worksheet
4 pages

Statistical Learning in Genetics An Introduction Using R Complete PDF Download

Uploaded by

Statistical Learning in Genetics An Introduction Using R Complete PDF Download

Uploaded by

Statistical Learning in Genetics An Introduction Using R

Click Download Now

ISSN 1431-8776 ISSN 2197-5671 (electronic)

Paper in this product is recyclable.

driven genetics/genomics research. Despite the focus on genetic applications, the

Aarhus, Denmark Daniel Sorensen

Part I Fitting Likelihood and Bayesian Models

4.4 Approximating a Marginal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

8.3 The Benjamini-Hochberg False Discovery Rate . . . . . . . . . . . . . . . . . . . . 338

Part III Exercises and Solutions

13 Solution to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575

Suppose there is a set of data consisting of observations in humans on forced

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1

considered to be important for identifying disease-causing or rare genetic

1.2 The Sampling Distribution of a Random Variable

A useful starting point is to establish the distinction between a probability distribu-

E(X|θ ) = 0 × Pr(X = 0|θ ) + 1 × Pr(X = 1|θ ) = θ

You might also like