0% found this document useful (0 votes)

4 views

Lecture 1

Uploaded by

rezkisananda08

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Lecture 1

Uploaded by

rezkisananda08

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Statistical Thinking

(ETC2420/ETC5242)

Dan Simpson
Week 1: Random things
Welcome to Statistical Thinking

2
What are we doing here?

We are learning about randomness

We are learning about using computers for statistics
We are learning how to leverage computational tools to understand the
statistical properties of things
We are going to be using a lot of R!

3
Who am I?

Name: Daniel (Dan) Simpson (he/him)

From: Queensland originally, but most recently Canada.
How long have I been here? Since May.
Favourite Liza Minnelli Album: Results

4
How do you contact me?

The discussion forum: Everything that isn’t secret and all questions about
course materials and the content of assessment
Oﬃce Hour: The hour after the lecture (same Zoom link). This won’t be
recorded but it will be an easy chance to talk and will go for as long as it
needs to.
Course Email: [email protected]. This is for everything
that isn’t for the discussion forum. It is only accessed by me and the head
tutor Lachlan Macquire.
(Only for something that can’t be read by Lachlan) My email:
[email protected]

5
What is the assessment?

Week 4: 24 hours take home quiz (Electronic). 15%

Week 7: Assignment 1. Assigned groups of 3-4. 20%
Week 11: Assignment 2. Assigned groups of 3-4. 20%
Final Exam. 2 hours. 45%. (40% hurdle.)

6
Assessment policies

Lateness: 10% per day or part thereof. (There will be a very small amount of
grace granted for things that could have been conceivably held up by the
submission process. But we are talking a few minutes only.)
Extensions: Please follow the Special Considerations policy.

7
Academic honesty

Do. Not. Cheat.

If you copy someone’s work that is cheating.
If you copy an answer from the internet without a clear reference, that is
cheating.
Check the policies for a more thorough deﬁnition of cheating and the
consequences.
Do. Not. Cheat. It will ruin your semester and it will ruin mine.

8
Computing

The course will be run using R.

You are expected to use R and RStudio. Please follow the Week 0
instructions if you have not installed it.
We will assume that you have the latest versions of both R (4.1.0) and
RStudio (1.4.1717).
We also assume you are using the most recent version of all of the packages
(if in doubt re-install them!)
The best resource for R help is always Google. You can also try our
discussion forum.

9
RMarkdown

We want you to use RMarkdown for your assignments and recommend

using it for the labs and the exercises.
There will be examples throughout the semester on how to use it. (Eg the
week 1 bonus video)
For the assignments, we may allow the submission of documents that
weren’t prepared in RMarkdown, but there will be extra requirements.

10
What is in the course?

11
Random variables

12
Week 1 Learning Goals (Lab)

The learning goals forthe lab in Week 1:

Learn how to set up R and RStudio on your own device.

Learn to install and load R packages.
Learn what are R Markdown ﬁles and reproducible research.
Learn what is ‘the tidyverse’.
Learn some basic R commands to manipulate and plot data.

13
Week 1 Learning Goals (Lecture)

The learning goals for the lecture in Week 1:

Learn what a random variable is.

What is a cumulative density function and a probability density function.
Use simulations to reason about random variables

14
What is a random variable?

A random variable X is the a variable that could potentially take a number of

possible values at random.
An example is X = {Number of Heads from 3 coin ﬂips}.

The realisation x of the random variable X is the value of the random

variable after it has been observed.
In the above example, x = 0 is a realisation of the random variable.

If you put more than one random variable together, you get a random vector.
Eg. If X is as above and Y is a variable indicating if my mother will win the
lottery this year, then (X, Y) is a random vector.

15
Example: A coin ﬂip

Imagine a biased coin with probability of seeing a head from a ﬂip p.

We can characterize the random variable X that we get by ﬂipping the coin n
times by considering the probabiity that it is equal to 0, 1, . . . , n − 1

Pr(X = 0) = {All coins land on tails}

= (1 − p) × (1 − p) × · · · × (1 − p)
| {z }
n times

= (1 − p)n

Can get the probaility of exactly k heads?

Pr(X = 0) = {k coins land on heads and (n − k) coins land on tails}
= p × p × · · · × p × (1 − p) × (1 − p) × · · · × (1 − p)
| {z } | {z }
p times (n−k) times
k (
= p (1 − p) n − k)
Is this correct?
16
Let’s do some maths

Obvously we want to check this is correct!

So let’s check the example when n = 2.

Pr(X = 0) = (1 − p)2
Pr(X = 1) = p(1 − p)
Pr(X = 2) = p2

Is this correct?

17
Ok but maths is annoying. Let’s use a computer

N = 1e+06 # number of simualtions

p = 0.3 # Any number will work

## Simulate N trials using rbinom

X = rbinom(N, size = 2, prob = p)

paste("k = 0: ", mean(X == 0), (1 - p)ˆ2)

[1] "k = 0: 0.490152 0.49"

paste("k = 1: ", mean(X == 1), p * (1 - p))

[1] "k = 1: 0.419769 0.21"

paste("k = 2: ", mean(X == 2), pˆ2)

[1] "k = 2: 0.090079 0.09" 18

So obviously something went wrong

Pr(X = 1) is twice what we expected.

Why? Because there are 2 ways it could happen: HT or TH!
Each of these events has probabity p(1 − p)
So the probability of either HT or TH happening is 2p(1 − p).
Checking your maths with simulations saves a lot of heartache!

19
Lesson: Everything about probabiity is a statement about realisations
of random variables

A probability of an even is the idealised proportion of times an event

happens.
It corresponds to how often it woudl happen on average if we tried an
infinite nubmer of times.
We defintiely cannot try things an infinte number of times.
But we can definitely try them a large and finite number of times.

20
What about continuous outcomes

When outcomes are discrete (eg number of heads), we can enumerate them
and look at the probability of each outcome.
But when it’s continuous, (eg height) we cannot do this.
So instead we look at the cumulative distribution function or CDF
F(x) = Pr(X ≤ x)
F(x) is always zero at −∞ and always one at ∞
F(x) never decreases. Why?
For example: The exponential distribution
F(x) = 1 − e−λx ,
for some λ > 0.

21
We can plot the CDF!

22
We can plot the CDF!

1.00

0.75

0.50
y

0.25

0.00

0 2 4
X

23
But is this really how we usually visualise a distribution?

data %>%
ggplot(aes(x = X)) + geom_histogram(...)

24
But is this really how we usually visualise a distribution?

0.6

0.4
density

0.2

0.0

0 2 4 6
X

25
This histogram “is” the derivative of the CDF

We call this the probability denisty function.

We usually write it as p(x) and it satisﬁes
Zx
Pr(X ≤ x) = p(t) dt
−∞
By the fundamental theorem of calculus, this is the derivative of the CDF.
So for the exponential distribution with F(x) = 1 − e−λx , the pdf is
p(x) = λe−λx

26
Here’s a plot

1.00

0.75
density

0.50

0.25

0.00

0 2 4 6
X

27
A confession: That last plot was not very easy to make

28
What is the value of simualtion?

We can use simulations to look at werid functions of random variables.

This is important because a test statistics in a hypothesis test is a function of
the data and the data is assumed to be iid realisations of a random variable.
An example: If z1 , z2 ∼ N(0, 1), what is the density of x = z12 + z22 .
(You can prove using calculus that x has a Chi-squared distribution with 2
degrees of freedom)

29
Doing it with simualtions

0.5

0.4

0.3
density

0.2

0.1

0.0

0 5 10
x

30
But we can do much more complex things

Scenario: You are in your mum’s car and she has the radio tuned to an oldies
station. You notice that they play Meat Loaf’s Paradise By The Dashboard Light
every time the disk jockey needs a break. Curious, you time the intervals between
plays and you notice that it the gaps are roughly exponentially distributed with
rate parameter λ = 0.01.

1 Assuming the gaps are independent, what is the median amount of time
between spins of Paradise by the Dashboard Light?
2 If you listen to the station long enough for Paradise by the Dashboard Light
to be played ﬁve times, what is the distribution of the smallest gap between
plays? (Assuming playing times are independent)

31
The exact solutions: Part 1.

The median of a continuous distribution is the point x0.5 such that F(x0.5 ) = 0.5.
Plugging in the CDF for the exponential distribution, we get
1
F(x0.5 ) =
2
−λx0.5 1
1−e =
2
1
= e−λx0.5
2
− log 2 = −λx0.5
log 2
x0.5 =
λ
In this case, we get x0.5 ≈ 69.3.

32
Part 2 is harder.

It is possible to work out this answer mathematically.

Let X1 , . . . , X4 be the lengths of the four gaps.

We can assume that X1 , . . . , X4 ∼iid Exp(0.01).
Let Y = min{X1 , X2 , X3 , X4 }.
Then we can show (how?) that
Pr(Y ≤ y) = 1 − e−0.04y

33
Task: Use a simulation to validate this result

0.04

0.03
density

0.02

0.01

0.00

0 50 100 150 200

34
Simulations and hypothesis tests

35
What is a hypothesis test (a reminder)

Hypothesis tests are used to assess whether or not a data set is meaningfully
different from some baseline distribution.
Often, the baseline distribution corresponds to the least interesting case.
In this situation, the hypothesis test can be used to assess if the data is
different from the least interesting case.
But how do you assess difference? We need a single number summary.
This is the test statistic. It usually measures something relevant.
For instance, if our null hypothesis was that the mean of a random variable
was zero, then a test statistic might by T(y) = ȳ

36
A note of caution

A hypothesis test is only as meaningful as its assumptions.

The distribution of the null hypothesis needs to be plausible
The test statistic needs to be useful (eg, two distibutions can have the same
mean but be very very diﬀerent)

37
A hypothesis test

Imagine you have a sample of 100 times between Paradise by the Dashboard
Light and you wanted to test if the average time between plays was more
than one hour.
H0 : µ = 60 vs H1 : µ > 60.
Furthermore, we know that whatever µ is, the data is iid Exp(µ−1 ).
Our test statistic is T(y) = ȳ − 60
For our observed data, we have T(y) = 11
Is this observed data unusual under H0 ?

38
How do we answer this with simulation?

We want to look at Pr(T(y) > 11) when y is 100 iid samples from Exp(1/60)
We can compute this if we have samples from T(y).
We can get one sample form T(y) under H0 as follows:
1 Sample 100 iid exponentials with mean 60: y = rexp(100, 1/60)
2 Compute mean(y) - 60
If we get a lot of samples from T(y) under H0 we can assess how unusual the
value of 11 is.

39
A plot
4000

3000
count

2000

1000

−20 0 20
test

[1] "Prob T(y) > 11 = 0.03971"

40
This is a useful general technique that we will look at more

This is a generic procedure that lets us perform a hypothesis test without

needing to know the precise distributrion of the test statistic!
We need 2 ingredients: A test statistic and a null hypothesis speciﬁes the
distribution of the null data.
This procedure is often called a Monte Carlo hypothesis test

A. K. Basu - Introduction To Stochastic Process
100% (2)
A. K. Basu - Introduction To Stochastic Process
430 pages
Interpreting Statistics Worksheet
No ratings yet
Interpreting Statistics Worksheet
4 pages
Statistics and Probability
No ratings yet
Statistics and Probability
46 pages
Course Materials: Text
No ratings yet
Course Materials: Text
32 pages
Sappppppp 1
No ratings yet
Sappppppp 1
127 pages
MetNum1 2023 1 Week 9 With Worked Examples
No ratings yet
MetNum1 2023 1 Week 9 With Worked Examples
60 pages
MODULE 1 - Random Variables and Probability Distributions
No ratings yet
MODULE 1 - Random Variables and Probability Distributions
12 pages
Probst at Book
No ratings yet
Probst at Book
539 pages
UNIT-4
No ratings yet
UNIT-4
38 pages
DA UNIT-4
No ratings yet
DA UNIT-4
37 pages
Probability Distribution: Question Booklet
No ratings yet
Probability Distribution: Question Booklet
8 pages
Stats Semis
No ratings yet
Stats Semis
18 pages
1 Probability Intro
No ratings yet
1 Probability Intro
26 pages
Random Variables: Fall 2017 Instructor: Ajit Rajwade
No ratings yet
Random Variables: Fall 2017 Instructor: Ajit Rajwade
74 pages
Tutorial 7 - Questions
No ratings yet
Tutorial 7 - Questions
4 pages
PME-lec7-ch4-a
No ratings yet
PME-lec7-ch4-a
67 pages
AMSI DiscreteProbability4c
No ratings yet
AMSI DiscreteProbability4c
26 pages
Lesson 1. Exploring Random Variables
No ratings yet
Lesson 1. Exploring Random Variables
27 pages
Lesson 1midterm 2ndsem
No ratings yet
Lesson 1midterm 2ndsem
32 pages
Random Variable and ProbabilityDistribution
No ratings yet
Random Variable and ProbabilityDistribution
10 pages
Probability Distribution
No ratings yet
Probability Distribution
78 pages
Probability Distribution
No ratings yet
Probability Distribution
78 pages
Hybrid Math 11 Stat Q1 M1 W1 V2
No ratings yet
Hybrid Math 11 Stat Q1 M1 W1 V2
13 pages
Lecturenotes 2
No ratings yet
Lecturenotes 2
10 pages
Course Outline - Probability & Statistics (14-02-2022)
No ratings yet
Course Outline - Probability & Statistics (14-02-2022)
4 pages
Prob Stat Book
No ratings yet
Prob Stat Book
543 pages
Module Contents: Introduction To Statistics and Probability
No ratings yet
Module Contents: Introduction To Statistics and Probability
10 pages
Basicsofstatisticalmethods PDF
No ratings yet
Basicsofstatisticalmethods PDF
85 pages
Fundamentals of Mathematical Statistics: Pavol Oršanský
No ratings yet
Fundamentals of Mathematical Statistics: Pavol Oršanský
85 pages
SAPQ3 Febver
No ratings yet
SAPQ3 Febver
112 pages
From Algorithms To ZScores SHORT
100% (2)
From Algorithms To ZScores SHORT
409 pages
Class 1, 18.05 Jeremy Orloff and Jonathan Bloom 1 Probability vs. Statistics
No ratings yet
Class 1, 18.05 Jeremy Orloff and Jonathan Bloom 1 Probability vs. Statistics
20 pages
LESSON 1 Random-Variables
No ratings yet
LESSON 1 Random-Variables
29 pages
Sub BAB 4.1
No ratings yet
Sub BAB 4.1
4 pages
Random Variables
No ratings yet
Random Variables
44 pages
1st Quarter Stats and Probab Reviewer
No ratings yet
1st Quarter Stats and Probab Reviewer
9 pages
SHS - stat&Prob.Q3.W1 5.52pgs
No ratings yet
SHS - stat&Prob.Q3.W1 5.52pgs
52 pages
Grade 1: Random Variable and Probability Distribution
100% (1)
Grade 1: Random Variable and Probability Distribution
13 pages
Lab 8
No ratings yet
Lab 8
5 pages
Intro To Probability (Pattern Recognition)
No ratings yet
Intro To Probability (Pattern Recognition)
94 pages
Review of Statistics Econ3005 L1 AEF
No ratings yet
Review of Statistics Econ3005 L1 AEF
42 pages
2024 F STA-1005ab Review Problems for the Final Exam
No ratings yet
2024 F STA-1005ab Review Problems for the Final Exam
65 pages
Introductin To Econometrics
No ratings yet
Introductin To Econometrics
34 pages
Mod 1 Statistics and Probability
No ratings yet
Mod 1 Statistics and Probability
14 pages
Chapter 2 - Lesson 4 Random Variables
No ratings yet
Chapter 2 - Lesson 4 Random Variables
19 pages
Midterm I Review - 1 Per Page
No ratings yet
Midterm I Review - 1 Per Page
24 pages
Q3 Statistics and Probability Week 1
No ratings yet
Q3 Statistics and Probability Week 1
19 pages
q3 Stat Prob Week 1 7
No ratings yet
q3 Stat Prob Week 1 7
95 pages
Chapter 6 (Non-Math)
No ratings yet
Chapter 6 (Non-Math)
14 pages
Statistics and Probability
100% (2)
Statistics and Probability
81 pages
Download full (Ebook) From algorithms to Z-scores by Matloff N. ebook all chapters
100% (3)
Download full (Ebook) From algorithms to Z-scores by Matloff N. ebook all chapters
57 pages
Descriptive Statistics and Probability Distributions: Session 1
No ratings yet
Descriptive Statistics and Probability Distributions: Session 1
34 pages
Course Outline - Probability & Statistics (05-03-2021)
No ratings yet
Course Outline - Probability & Statistics (05-03-2021)
4 pages
Math Reviewer
No ratings yet
Math Reviewer
31 pages
Course Notes
No ratings yet
Course Notes
111 pages
Attacking Problems in Logarithms and Exponential Functions
From Everand
Attacking Problems in Logarithms and Exponential Functions
David S. Kahn
5/5 (1)
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Functions and Probability for Sixth Graders
From Everand
Functions and Probability for Sixth Graders
Home School Brew
No ratings yet
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
50 Kash Sharma Maths
No ratings yet
50 Kash Sharma Maths
13 pages
Sample Variance Proof of Chi Square Distribution
No ratings yet
Sample Variance Proof of Chi Square Distribution
3 pages
CH 06 V 1
No ratings yet
CH 06 V 1
42 pages
MAT102 - Statistics For Business - UEH-ISB - T3 2022 - Unit Guide - DR Chon Le
No ratings yet
MAT102 - Statistics For Business - UEH-ISB - T3 2022 - Unit Guide - DR Chon Le
12 pages
Tugas Rutin 2 PEMODELAN
No ratings yet
Tugas Rutin 2 PEMODELAN
3 pages
Reasoning with Data An Introduction to Traditional and Bayesian Statistics Using R 1st Edition Jeffrey M. Stanton - Download the ebook now and own the full detailed content
100% (3)
Reasoning with Data An Introduction to Traditional and Bayesian Statistics Using R 1st Edition Jeffrey M. Stanton - Download the ebook now and own the full detailed content
56 pages
Gamma Distribution
No ratings yet
Gamma Distribution
3 pages
Probable Errors
No ratings yet
Probable Errors
13 pages
Growth and Survival Rates of Narra Trees (Pterocarpus Indicus) in Seed Ball Substrate
No ratings yet
Growth and Survival Rates of Narra Trees (Pterocarpus Indicus) in Seed Ball Substrate
13 pages
Pfeifer 2014 Dynare Graphs
No ratings yet
Pfeifer 2014 Dynare Graphs
51 pages
Statistics N465
No ratings yet
Statistics N465
10 pages
Censoring Truncation Survey
No ratings yet
Censoring Truncation Survey
40 pages
Mediation
No ratings yet
Mediation
77 pages
4
No ratings yet
4
15 pages
YMS Ch7: Random Variables AP Statistics at LSHS Mr. Molesky
100% (1)
YMS Ch7: Random Variables AP Statistics at LSHS Mr. Molesky
1 page
Problem Sheet 1 Answers
No ratings yet
Problem Sheet 1 Answers
4 pages
Full A Course in The Large Sample Theory of Statistical Inference 1st Edition William Jackson Hall & David Oakes Ebook All Chapters
100% (13)
Full A Course in The Large Sample Theory of Statistical Inference 1st Edition William Jackson Hall & David Oakes Ebook All Chapters
60 pages
Statistics Practice Questions
No ratings yet
Statistics Practice Questions
12 pages
Sine On Random
No ratings yet
Sine On Random
12 pages
Bolivia 2019 Elections Newman
No ratings yet
Bolivia 2019 Elections Newman
31 pages
ST2131 22S2 Tutorial 11 Solution
No ratings yet
ST2131 22S2 Tutorial 11 Solution
8 pages
Differential Geometry in Statistical Inference
75% (4)
Differential Geometry in Statistical Inference
246 pages
EOG SampleScaleScoreFrequencyReport
No ratings yet
EOG SampleScaleScoreFrequencyReport
1 page
Uts-Lampiran Hasil Analisis (Ni Kadek Dwi Sri Nasih - 04 - Manajemen N)
No ratings yet
Uts-Lampiran Hasil Analisis (Ni Kadek Dwi Sri Nasih - 04 - Manajemen N)
18 pages
Econometrics Assign 2024
No ratings yet
Econometrics Assign 2024
2 pages
Exercise Sheet 9 - May 14th, 2018: MATH-F302 - Probabilit Es II
No ratings yet
Exercise Sheet 9 - May 14th, 2018: MATH-F302 - Probabilit Es II
2 pages
Jindal Global Business School: Course Outline
No ratings yet
Jindal Global Business School: Course Outline
5 pages
Nonparametric Correlations: Nonpar Corr /VARIABLES Kelompok Terapi /print Spearman Twotail Nosig /missing Pairwise
No ratings yet
Nonparametric Correlations: Nonpar Corr /VARIABLES Kelompok Terapi /print Spearman Twotail Nosig /missing Pairwise
7 pages

Lecture 1

Uploaded by

Lecture 1

Uploaded by

Statistical Thinking

We are learning about randomness

Name: Daniel (Dan) Simpson (he/him)

Week 4: 24 hours take home quiz (Electronic). 15%

Do. Not. Cheat.

The course will be run using R.

We want you to use RMarkdown for your assignments and recommend

The learning goals forthe lab in Week 1:

Learn how to set up R and RStudio on your own device.

The learning goals for the lecture in Week 1:

Learn what a random variable is.

A random variable X is the a variable that could potentially take a number of

The realisation x of the random variable X is the value of the random

Imagine a biased coin with probability of seeing a head from a ﬂip p.

Pr(X = 0) = {All coins land on tails}

Can get the probaility of exactly k heads?

Obvously we want to check this is correct!

N = 1e+06 # number of simualtions

## Simulate N trials using rbinom

paste("k = 0: ", mean(X == 0), (1 - p)ˆ2)

[1] "k = 0: 0.490152 0.49"

paste("k = 1: ", mean(X == 1), p * (1 - p))

[1] "k = 1: 0.419769 0.21"

paste("k = 2: ", mean(X == 2), pˆ2)

[1] "k = 2: 0.090079 0.09" 18

Pr(X = 1) is twice what we expected.

A probability of an even is the idealised proportion of times an event

We call this the probability denisty function.

We can use simulations to look at werid functions of random variables.

It is possible to work out this answer mathematically.

Let X1 , . . . , X4 be the lengths of the four gaps.

0 50 100 150 200

A hypothesis test is only as meaningful as its assumptions.

[1] "Prob T(y) > 11 = 0.03971"

This is a generic procedure that lets us perform a hypothesis test without

You might also like