Week 2 Cheat Sheet
Week 2 Cheat Sheet
Charlie Nu©elman
Here, I provide the mathemacal equaons and some of the important Excel funcons required to
perform various calculaons in Week 2 of the course. The headings represent the screencasts in which
you will find those calculaons and concepts. Not all screencasts are referenced below – just the ones
that have complex mathemacal formulas or Excel formulas that are tricky to use.
Populaon and sample mean (average) can be calculated using the AVERAGE funcon in Excel.
Populaon and sample variances can be calculated using the VAR.P and VAR.S formulas, respecvely,
and the populaon and sample standard deviaons can be calculated using the STDEV.P and STDEV.S
formulas, respecvely. The COUNT funcon is useful in counng the number of observaons.
Í 𝑥Ü = 𝑥5 + 𝑥6 + ⋯ + 𝑥9
Ü@5
= 1 + 2 + 3 + 4 + 5 = 15
The summaon symbol is used in the definion and calculaon of average and variance (see below).
Descripve Stascs
Another common measure of spread in a set of data is the range of the data, which is just the maximum
value in the data set minus the minimum value. We can calculate the maximum value of a set of data in
Excel using the MAX funcon and the minimum value using the MIN funcon; the range is simply the
difference between those two values.
Skewness and kurtosis are somemes used to describe the asymmetry of a set of data when compared
to the normal distribuon. The SKEW and KURT funcons in Excel can determine these parameters. For
more informaon on how to interpret these values, please visit support.microsoL.com.
Once we have the rank, we can linearly interpolate between ordered values in our data. For example, if
our (ordered) data is: 5, 9, 12, 14, 17, 18, 21, 22, 25 (𝑛 = 9) and we wish to find the first quarle
including the median, we would calculate the rank as 𝑘 = 0.25 ∙ (9 − 1) + 1 = 3. Therefore, the first
quarle in this case is 12. Similarly, the third quarle would be calculated to be 21 (𝑘 = 7).
For the same data set, if we wished to find the first quarle excluding the median, we would calculate
the rank as 𝑘 = (9 + 1) ∙ 0.25 = 2.5. Therefore, we linearly interpolate 50%of the way between the 2 nd
and 3rd values of the ordered data, and the first quarle is 9 + 0.
5 x (12 – 9) = 10.
5. Similarly, the third
quarle would be calculated to be 21.5 (𝑘 = 7.5).
Percenles are calculated exactly the same but 𝑝 can be any connuous value between 0 and 1. For
example, for the above data set if we wanted to calculate the median-excluded 13 th percenle, we
calculate the rank: 𝑘 = (9 + 1) ∙ 0.13 = 1.3. The 13th percenle is then 30% of the way between the 1st
and 2nd of the ordered values = 5 + 0.3 x (9 – 5) = 6.2.
Histograms
The best way to visualize the distribuon of univariate data is the use of a histogram. In a histogram,
the data are sorted into “bins” of constant width and frequencies of each bin are plo©ed as a column
chart. We typically esmate a lower bound and an upper bound for the number of bins:
𝑛ÕÜáæ,ßâêØå = 𝐼𝑁𝑇k𝐿𝑂𝐺6(𝑛)o − 1
Here, 𝑛 is the number of observaons or experimental measurements. I like to choose the actual
number of bins to be somewhere between the lower and upper esmates for number of bins.
Excel’s histogram tool (Data Data Analysis Histogram) is great for parsing the data into the bins,
but the user must provide the bin boundaries.