0% found this document useful (0 votes)
10 views

stat_week_1

The document outlines the structure and content of the Statistical Data Analysis course for the 2024/25 academic year at the University of London, led by Glen Cowan. It includes information on lecture schedules, problem sheets, computing requirements, and a detailed course outline covering various statistical topics. Additionally, it discusses the interpretation of probability, Bayesian statistics, and random variables, along with their applications in particle physics.

Uploaded by

selebet423
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

stat_week_1

The document outlines the structure and content of the Statistical Data Analysis course for the 2024/25 academic year at the University of London, led by Glen Cowan. It includes information on lecture schedules, problem sheets, computing requirements, and a detailed course outline covering various statistical topics. Additionally, it discusses the interpretation of probability, Bayesian statistics, and random variables, along with their applications in particle physics.

Uploaded by

selebet423
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Statistical Data Analysis 2024/25

Lecture Week 1
London Postgraduate Lectures on Particle Physics
University of London MSc/MSci course PH4515

Glen Cowan
Physics Department
Royal Holloway, University of London
[email protected]
www.pp.rhul.ac.uk/~cowan

Course web page via RHUL moodle (PH4515) and also


www.pp.rhul.ac.uk/~cowan/stat_course.html

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 1


Statistical Data Analysis
Lecture 1-1

• Course structure and policies


• Outline of topics
• Some resources

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 2


Course Info (1)
This year’s lectures:
Mondays 2-5 pm starting 30 Sep., no reading week.
First two hours are the primary lectures, 3 rd hour is discussion
session (additional examples, discussion of problem sheets,
computing tools, etc.).
Venue: Stewart House 2/3, 32 Russell Sq, London WC1B 5DN
There will be 9 problem sheets.
Some paper & pencil, some computing problems.
Due weekly Mondays 4 pm starting lecture week 3 through 11.
Late submissions according to College policy (10% off for 24h,
then no credit unless agreed ).

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 3


Course Info (2)
For problem sheets, scan/merge into a single pdf file.
Filename: YourName_stat_prob_sheet_n.pdf (n = 1,2,…)
Please no hi-res photos from phone (use iScanner or similar).
MSc/MSci students: upload on moodle.
PhD students: email to [email protected] with exact subject line:
statistics problem sheet n (n = 1,2,...)
For MSc/MSci students, written exam at end of year (May 2024).
Msc/Msci: Exam worth 80%, problem sheets 20%.
For PhD students, no statistics exam.

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 4


Computing
The coursework includes short computer programs.
Some choice of language – best to use python (version 3).
Also possible to use C++ in linux environment. This requires
specific software (ROOT and its class library) – cannot just use
e.g. visual C++
For PhD students, can use your own accounts – usual HEP
setup should be OK.
For MSc/MSci students, if you want to use C++ you should
request an account on the RHUL linux cluster. You create an X-
Window on your local machine (e.g. laptop), and from there
you remotely login to RHUL.
For mac, install XQuartz from www.xquartz.org and open
a terminal window. For windows, various options, e.g.,
mobaXterm or cygwin/X (more info on web page).
G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 5
Course Outline
1 Probability, Bayes’ theorem
2 Random variables and probability densities
3 Expectation values, error propagation
4 Catalogue of pdfs
5 The Monte Carlo method
6 Statistical tests: general concepts
7 Test statistics, multivariate methods
8 Goodness-of-fit tests
9 Parameter estimation, maximum likelihood
10 More maximum likelihood
11 Method of least squares
12 Interval estimation, setting limits
13 Nuisance parameters, systematic uncertainties
14 Examples of Bayesian approach

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 6


Some statistics books, papers, etc.
G. Cowan, Statistical Data Analysis, Clarendon, Oxford, 1998
R.J. Barlow, Statistics: A Guide to the Use of Statistical Methods in
the Physical Sciences, Wiley, 1989
Ilya Narsky and Frank C. Porter, Statistical Analysis Techniques in
Particle Physics, Wiley, 2014.
Luca Lista, Statistical Methods for Data Analysis in Particle Physics,
Springer, 2017.
L. Lyons, Statistics for Nuclear and Particle Physics, CUP, 1986
F. James., Statistical and Computational Methods in Experimental
Physics, 2nd ed., World Scientific, 2006
S. Brandt, Statistical and Computational Methods in Data Analysis,
Springer, New York, 1998.
R.L. Workman et al. (Particle Data Group), Prog. Theor. Exp. Phys.
2022, 083C01; pdg.lbl.gov sections on probability, statistics, MC.
G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 7
Statistical Data Analysis
Lecture 1-2

• Tasks of statistical data analysis in science


• The role of uncertainty
• Definition of probability

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 8


Theory Statistics Experiment
Theory (model, hypothesis): Experiment (observation):

+ response of measurement
apparatus

= model prediction
data

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 9


Some tasks of statistical data analysis

Compare data to predictions of competing models


Most models contain adjustable parameters
(e.g., particle physics: GF, MZ, αs, mH,... )
Estimate (measure) the unknown parameters
Quantify uncertainty in parameter estimates
Test and quantify the extent to which the model is in agreement
with the data.

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 10


Uncertainty
Uncertainty enters on several levels
Measurements not in general exactly reproducible
Quantum effects
Random effects (even without QM)
Model prediction uncertain
Approximations used to extract theoretical prediction
Modeling of apparatus

Quantify the uncertainty using PROBABILITY

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 11


A definition of probability
Consider a set S (the sample space)
Interpretation of elements left open,
S could be e.g. set of outcomes of a
repeatable observation.
Kolmogorov (1933)
Label subsets of S as A, B, ...

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 12


Properties of Probability
From the axioms of probability, further properties can be
derived, e.g.,

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 13


Conditional probability
Start with sample space S (e.g., set of outcomes), then restrict
to a subset B (with P(B) ≠ 0).
Define conditional probability of A given B (~4th axiom):

E.g. rolling die, outcome n = 1,2,...,6:

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 14


Independence
Subsets A, B independent if:

If A, B independent,

I.e. if A, B, independent, imposing one has no effect on the


probability of the other.

N.B. do not confuse with disjoint subsets, i.e., A ∩ B = ∅


E.g. dice: A = n even, B = n odd, A ∩ B = ∅
P(A) = ½
P(A|B) = 0
Requiring B affects probability of A, so A, B not independent.

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 15


Statistical Data Analysis
Lecture 1-3

• Interpretation of probability
• Bayes’ theorem
• Law of total probability

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 16


Interpretation of Probability
I. Relative frequency (→ “frequentist statistics”)
A, B, ... are outcomes of a repeatable experiment

cf. quantum mechanics, particle scattering, radioactive decay...


II. Subjective probability (→ “Bayesian statistics”)
A, B, ... are hypotheses (statements that are true or false)

• Both interpretations consistent with Kolmogorov axioms.


• In particle physics frequency interpretation often most
useful, but subjective probability can provide more natural
treatment of non-repeatable phenomena: systematic
uncertainties, probability that magnetic monopoles exist,...

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 17


Bayes’ theorem
From the definition of conditional probability we have

and

but , so
Bayes’ theorem
Bayes’
theorem

First published (posthumously) by the


Reverend Thomas Bayes (1702−1761)

An essay towards solving a problem in the doctrine


of chances, Philos. Trans. R. Soc. 53 (1763) 370
G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 18
The law of total probability B

Consider a subset B of
the sample space S, S

divided into disjoint subsets Ai Ai


such that ∪i Ai = S,
B ∩ Ai


→ law of total probability

Bayes’ theorem becomes

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 19


An example using Bayes’ theorem
Suppose the probability (for anyone) to have a disease D is:
← prior probabilities, i.e.,
before any test carried out

Consider a test for the disease: result is + or −

← probabilities to (in)correctly
identify a person with the disease

← probabilities to (in)correctly
identify a healthy person

Suppose your result is +. How worried should you be?


G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 20
Bayes’ theorem example (cont.)
The probability to have the disease given a + result is

← posterior probability

i.e. you’re probably OK!


Your viewpoint: my degree of belief that I have the disease is 3.2%.
Your doctor’s viewpoint: 3.2% of people like this have the disease.

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 21


Frequentist Statistics − general philosophy
In frequentist statistics, probabilities are associated only with
the data, i.e., outcomes of repeatable observations (shorthand: x).
Probability = limiting frequency
Probabilities such as
P (string theory is true),
P (0.117 < αs < 0.119),
P (Biden wins in 2020),
etc. are either 0 or 1, but we don’t know which.
The tools of frequentist statistics tell us what to expect, under
the assumption of certain probabilities, about hypothetical
repeated observations.
Preferred theories (models, hypotheses, ...) are those that
predict a high probability for data “like” the data observed.
G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 22
Bayesian Statistics − general philosophy
In Bayesian statistics, use subjective probability for hypotheses:

probability of the data assuming


hypothesis H (the likelihood) prior probability, i.e.,
before seeing the data

posterior probability, i.e., normalization involves sum


after seeing the data over all possible hypotheses

Bayes’ theorem has an “if-then” character: If your prior


probabilities were π(H), then it says how these probabilities
should change in the light of the data.
No general prescription for priors (subjective!)

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 23


Statistical Data Analysis
Lecture 1-4

• Random variables
• Probability (density) functions:
– joint pdf
– marginal pdf
– conditional pdf

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 24


Random variables and probability density functions
A random variable is a numerical characteristic assigned to an
element of the sample space; can be discrete or continuous.
Suppose outcome of experiment is continuous value x

→ f (x) = probability density function (pdf)

x must be somewhere
G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 25
Probability mass function
For discrete outcome xi with e.g. i = 1, 2, ... we have

probability (mass) function

x must take on one of its possible values

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 26


Cumulative distribution function
Probability to have outcome less than or equal to x is

cumulative distribution function

Alternatively define pdf with

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 27


Quantiles
Define quantile (α-point) xα by F(xα) = α (0 ≤ α ≤ 1)

i.e., quantile xα is inverse of cumulative distribution: xα = F−(α),


Special case of quantile: x1/2 = median
(compare to peak of pdf = mode)
G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 28
Histograms
Data sample x = (x1,..., xn),
# events n could be very large.
→ Histogram N = (N1,..., NM)
M bins, bin size Δx.

pdf = histogram with infinite


data sample, zero bin width,
normalized to unit area.

Often normalize histogram to unit area, compare directly to pdf.

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 29


Multivariate distributions, joint pdf
Outcome of experiment charac-
terized by several values, e.g. an
n-component vector, (x1, ... xn)

joint pdf

Normalization:
G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 30
Marginal pdf
Sometimes we want only pdf of
some (or one) of the components:

marginal pdf

E.g. to find marginal pdf of x1 from n-dim. joint pdf, integrate over
all variables except x1 :

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 31


Marginal pdf (2)

Marginal pdf is the


projection of joint pdf
onto individual axes.

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 32


Conditional pdf
Sometimes we want to consider some components of joint pdf as
constant. Recall conditional probability:

→ conditional pdfs:

E.g. h(y|x) is a pdf for y, here x is fixed.


The denominator fixes normalization so that

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 33


Conditional pdf (2)
E.g. joint pdf f (x,y) used to find conditional pdfs h(y|x1), h(y|x2):

Basically treat some of the r.v.s as constant, then divide the joint
pdf by the marginal pdf of those variables being held constant so
that what is left has correct normalization, e.g.,

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 34


Bayes’ theorem, independence for conditional pdf

Bayes’ theorem becomes:

Recall A, B independent if

→ x, y independent if

Then e.g. fixing y has no effect on pdf of x:

G. Cowan / RHUL Physics Statistical Data Analysis / lecture week 1 35

You might also like