Stats Notes
Stats Notes
Discrete random
variables just means that
we have a random
experiment where the
outcomes only take We notice that:
discrete values, and each 1. The sum of the probabilities always
has an assigned equals 1
probability. 2. The x's can take numerical or non-
E.g → rolling a dice, each numerical values.
value has a probability of
1/6. Expected value:
This is similar to the mean, it is what you
would expect the mean to be if you were to
repeat the experiment many times.
A 'fair game' is an
experiment where
E(X) = 0, no gain or
loss, on average.
How to do on calculator:
In a spreadsheet, fill in one row of the observed
values, one of the expected values
Do not include totals
and additional Menu → Stat tests → X^2 GOF
columns or rows in This gives a p-value, which can be compared to
the matrix. the significance level.
Degrees of freedom:
Probability times the
n-1 → say there are 4 columns of data then the
number of trials.
degrees of freedom is just 3.
1. 24 + 21 = 45 (> 3 tonnes)
2. 24 + 32 + 20 + 14 (bees)
4. 45 ∗ 90/175 = 23.1
Results:
A p-value will be calculated here as
well, which you will need to compare
against the significance level
provided, or your X 2 value may be
compared against a provided
critical value.
Collection of data
It is important to be aware of any
possible errors in the collection
methods (4.1).
Disadvantages:
You have to create a large number of
questions of equal difficulty to
measure the same quality.
Proving two parallel tests are
equivalent is difficult.
Calculating SSR
SSR = (difference between data)^2 + (difference between data)^2 etc
What is SSR?
It looks at the difference between the actual value of that data point, and the one of the
line of regression. The difference between the mean (y bar) and the predicted value (y
hat)
SST is the total variations between the mean and the actual values
squared
n= sample size
same
Working out:
Type of distribution → Example X ~ N
(mean, standard deviation)
When?
An event can occur any number of times (no upper or lower, known limits on the
number of occurrences) during a time period.
Independent events; In other words, if an event occurs, it does not affect the
probability of another event occurring in the same time period.
Rate of occurrence is constant; that is, the rate does not change based on time
(mean is constant for any given time period).
In a Poisson distribution, the mean = the standard deviation squared, this may be
useful in proving whether it does follow a Poisson distribution, because the two
values should be approximately equal
The mean = mode = variance → can be useful for testing whether it does follow the
Poisson distribution
Critical values:
Spreadsheet, first column = 0s, 2nd column = values around the mean, 3rd column =
PoissonCDF(mean, title of second column, title of 1st column) YOU HAVE TO LABEL
THE 3rd COLUMN as well
Or THIS ACTUALLY WORKS -> 1st column = values around the mean, 3rd column you
do PoissCDF(mean, 0, a1) then drag down (to drag lots, when its got the four way hold
and drag)
Or sadly you have to guess random values until it’s just on the edge of the significance
level, so you put the mean, 0, then guess a value
● The independent variables must consist of two related groups or matched pairs.
Example:
Conclusions
There are two acceptable conclusions to
a hypothesis test:
α and β
α is the level of significance.
Then put in the values to the t cdf/ Find the probability of a Type II
normal cdf/ binomial cdf error
Finding the critical value (s)
OR
σ: (pop SD known & n>30), Sn-1 (t- Statistics → Stat tests → t-Test → Data
distr,σ unknown or n<30) or S_(n−1)/√n μ0 → population mean, the assumed
(testing x ̅, t-distr,σ unknown or n<30) mean
Continuous data is data that can take any value. Height, weight,
temperature and length are all examples of continuous data.
Mutually exclusive refers to two (or more) events that cannot both occur
when the random experiment is formed.
The Critical value is the value which determines the boundary of the critical
region.
Definitions 1
4.1 Data sampling methods
Simple/random → A method such
as drawing names from a hat, every
member of the population is equally
likely to be chosen.
Discrete random
variables just means that
we have a random
experiment where the
outcomes only take We notice that:
discrete values, and each 1. The sum of the probabilities always
has an assigned equals 1
probability. 2. The x's can take numerical or non-
E.g → rolling a dice, each numerical values.
value has a probability of
1/6. Expected value:
This is similar to the mean, it is what you
would expect the mean to be if you were to
repeat the experiment many times.
A 'fair game' is an
experiment where
E(X) = 0, no gain or
loss, on average.
How to do on calculator:
In a spreadsheet, fill in one row of the observed
values, one of the expected values
Do not include totals
and additional Menu → Stat tests → X^2 GOF
columns or rows in This gives a p-value, which can be compared to
the matrix. the significance level.
Degrees of freedom:
Probability times the
n-1 → say there are 4 columns of data then the
number of trials.
degrees of freedom is just 3.
1. 24 + 21 = 45 (> 3 tonnes)
2. 24 + 32 + 20 + 14 (bees)
4. 45 ∗ 90/175 = 23.1
Results:
A p-value will be calculated here as
well, which you will need to compare
against the significance level
provided, or your X 2 value may be
compared against a provided
critical value.
Collection of data
It is important to be aware of any
possible errors in the collection
methods (4.1).
Disadvantages:
You have to create a large number of
questions of equal difficulty to
measure the same quality.
Proving two parallel tests are
equivalent is difficult.
Calculating SSR
SSR = (difference between data)^2 + (difference between data)^2 etc
What is SSR?
It looks at the difference between the actual value of that data point, and the one of the
line of regression. The difference between the mean (y bar) and the predicted value (y
hat)
SST is the total variations between the mean and the actual values
squared
n= sample size
same
Working out:
Type of distribution → Example X ~ N
(mean, standard deviation)
When?
An event can occur any number of times (no upper or lower, known limits on the
number of occurrences) during a time period.
Independent events; In other words, if an event occurs, it does not affect the
probability of another event occurring in the same time period.
Rate of occurrence is constant; that is, the rate does not change based on time
(mean is constant for any given time period).
In a Poisson distribution, the mean = the standard deviation squared, this may be
useful in proving whether it does follow a Poisson distribution, because the two
values should be approximately equal
The mean = mode = variance → can be useful for testing whether it does follow the
Poisson distribution
Critical values:
Spreadsheet, first column = 0s, 2nd column = values around the mean, 3rd column =
PoissonCDF(mean, title of second column, title of 1st column) YOU HAVE TO LABEL
THE 3rd COLUMN as well
Or THIS ACTUALLY WORKS -> 1st column = values around the mean, 3rd column you
do PoissCDF(mean, 0, a1) then drag down (to drag lots, when its got the four way hold
and drag)
Or sadly you have to guess random values until it’s just on the edge of the significance
level, so you put the mean, 0, then guess a value
● The independent variables must consist of two related groups or matched pairs.
Example:
Conclusions
There are two acceptable conclusions to
a hypothesis test:
α and β
α is the level of significance.
Then put in the values to the t cdf/ Find the probability of a Type II
normal cdf/ binomial cdf error
Finding the critical value (s)
OR
σ: (pop SD known & n>30), Sn-1 (t- Statistics → Stat tests → t-Test → Data
distr,σ unknown or n<30) or S_(n−1)/√n μ0 → population mean, the assumed
(testing x ̅, t-distr,σ unknown or n<30) mean
Continuous data is data that can take any value. Height, weight,
temperature and length are all examples of continuous data.
Mutually exclusive refers to two (or more) events that cannot both occur
when the random experiment is formed.
The Critical value is the value which determines the boundary of the critical
region.
Definitions 1