Statistics
Statistics
FRECUENCY DISTRIBUTIONS.
MEASURES OF LOCATION AND
VARIABILITY.
Mónica Marbán
VARIABLES
Statistical variable
In statistics, we generally want to study a population, that is, an entire collection of
persons, things or objects.
To study the larger population we select a sample.
From the sample data, we can calculate a “statistic” (a number that is a property for the
sample)
A statistic is a parameter of a population “parameter” (property for the population)
Discretes
Quantitatives
Continuous
Variables
Nominal
Qualitatives
Ordinal
2 Mónica Marbán
VARIABLES
REALISE THAT VARIABLES ARE NOT JUST THOSE THINGS THAT CAN BE
MEASURED IN THE TRADITIONAL SENSE.
Discrete: the values are integer numbers coming from a counting process (the number
of children in a family, the number of employees in a company, the number of
siblings,…)
Continuous: the values are real numbers coming from a measurement process (the
height and weight of a person, the amount of time it takes a train to arrive at its
destination, the salary earned by an employee,…)
3 Mónica Marbán
VARIABLES
Nominal: the variable is an attribute whose categories are names or words which
cannot be ordered (nationality, profession, eye colour, type of accomodations, reasons
of travel, place of vacation for next year, last year place of vacation, possibility of
second residence, …)
Ordinal: idem but the categories can be ranked, so an order is introduced into the data
(category of hotel/ apartment, opinion relative to the price, your feeling about the travel,
quality compaired with last visits, type of selected tourist package, …). Ordinal data
include items such as rating scales and Likert scales, and are frequently used in asking
for opininions and attitudes.
4 Mónica Marbán
VARIABLES
Considerations in sampling: the sample size (at least 30 if you are going to
use statistical analysis on your data), the representativeness of the sample (it
allows us to extract conclussions), the access to the sample, the sampling
strategy to be used.
6 Mónica Marbán
FRECUENCY DISTRIBUTIONS
• Values, xi: the different values taken on by the variable. If the variable is not
nominal its values must be ordered from the lowest to the highest.
• Relative frequency, fi: the ratio of the absolute frequency to N. It´s a proportion.
• Absolute cumulative frequency, Ni: the number of observations which are equal
or lower than a given value.
7 Mónica Marbán
FRECUENCY DISTRIBUTIONS
xi ni fi Ni Fi
x1 n1 f1 N1 F1
x2 n2 f2 N2 F2
… … … … …
xi ni fi Ni Fi
… … … … …
xn nn fn Nn=N Fn=1
∑ N 1
8 Mónica Marbán
FRECUENCY DISTRIBUTIONS
In order to build a complete frequency distribution is enough to know the values of the
variable, some other column and N.
Two frequency distributions will be equal if the values of the variables are the same and so
are the corresponding relative frequencies
The range of a variable is the difference between the highest and the lowest value of such a
variable (R = the maximum value – the minimum value)
When we consider the data included in intervals we have grouped frequency distributions.
These have additional elements:
– Classes or intervals in which the data are included [li-1;li) or (li-1;li]
– The amplitud/ width or length of each interval, ci: the difference between the upper
and the lower class limits. The width can be constant or variable.
– Class mark or midpoint, xi, is the value representing the class and is calculated by
dividing the sum of the upper and lower class limits by 2
– Density, di, is the concentration of data inside the interval and is calculated by
dividing the absolute frequency by the length or width of interval
9 Mónica Marbán
FRECUENCY DISTRIBUTIONS
[li-1;li) ci x’i ni fi di Ni Fi
[li-1;li) c1 x’1 n1 f1 d1 N1 F1
[li-1;li) c2 x’2 n2 f2 d2 N2 F2
… … … … … … …
[li-1;li) ci x’i ni fi di Ni Fi
… … … … … … …
[ls-1;ls) cs x’s ns fs ds Ns=N Fs=1
∑ N 1
li + li −1 ni
ci =
li − li −1 ; x = ; di =
'
i
2 ci
10 Mónica Marbán
FRECUENCY DISTRIBUTIONS
Example: We asked at random 30 students about the highest year in which they have
got a course:
Put this array into a frecuency distribution table with columns for values, frecuency
and percentage.
X ni Ni fi Fi
1st 10 N1= 10
2nd 8 N2= 10 + 8 = 18
3rd 6 N3= 10 + 8 + 6 = 24
4th 6 N4 = 10 + 8 + 6 + 6 = 30
11 Mónica Marbán
FRECUENCY DISTRIBUTIONS
X ni Ni fi Fi
To calculate the cumulative relative frecuency, Fi: add all the previous relative
frecuencies (fi) to the relative frecuency for the current row (add this one too). The last entry
of the cumulative relative frecuency column is one, indicating that one hundred percent of the
data has been acumulated.
12 Mónica Marbán
FRECUENCY DISTRIBUTIONS
X ni Ni fi Fi
Ni
Fi =
N
13 Mónica Marbán
FRECUENCY DISTRIBUTIONS
The challenge is to group the data in such way that the most significant trends
become sharply visible. There is no single best solution to this problem, as so much of it
involves personal insights and judgements.
Let´s see and example and how to calculate class intervals, class marks, width,…
Example: In the following table we find the amount of money that 50 employees have
received as a bonus at the end of the year:
Money(€) 300 400 500 700 750 800 1000 1200 1500
Nº employees 5 7 10 11 6 5 3 2 1
14 Mónica Marbán
FRECUENCY DISTRIBUTIONS
Li −1 + Li
xic =
2
In this example the number of the classes and their width don´t follow any rule.
Later on we will see some criteria we could use to build intervals.
15 Mónica Marbán
FRECUENCY DISTRIBUTIONS
16 Mónica Marbán
FRECUENCY DISTRIBUTIONS
Statistics offers some guidelines for transforming ungrouped distributions into grouped
distributions:
1) Use no fewer than 5 classes and no more than 20. The number of classes is decided in
a somewhat arbitrary manner. Herewith, you have a quick guide to approximate the number
of intervals.
17 Mónica Marbán
FRECUENCY DISTRIBUTIONS
18 Mónica Marbán
FRECUENCY DISTRIBUTIONS
19 Mónica Marbán
FRECUENCY DISTRIBUTIONS
xi ni fi
Very poor 2 0,10
Poor 3 0,15
Average 5 0,25
Good 6 0,30
Very good 4 0,20
20 1
Ordinal variable
20 Mónica Marbán
FRECUENCY DISTRIBUTIONS
7
Absolute frequency
6
5
4
3
2
1
0
Very Poor Average Good Very
poor good
21 Mónica Marbán
FRECUENCY DISTRIBUTIONS
Relative frequency fi fi =
ni
α i = f i * 360º
N
20% 10%
15% Very poor
Poor
30% Average
25%
Good
Very good
22 Mónica Marbán
FRECUENCY DISTRIBUTIONS
xi ni fi
18 4 20%
19 5 25%
20 4 20%
21 3 15%
23 2 10%
25 1 5%
27 1 5%
20 1
23 Mónica Marbán
FRECUENCY DISTRIBUTIONS
30%
25%
Relative frequency fi
20%
15%
10%
5%
0%
18 19 20 21 23 25 27
24 Mónica Marbán
FRECUENCY DISTRIBUTIONS
Another example:
xi ni
1 2
2 3
3 4
4 3
5 6
6 8 16
7 14 14
14
8 9
9 4 12
10 2 10 9
8
Total 55 ni 8
6
6
4 4
4 3 3
2 2
2
0
1 2 3 4 5 6 7 8 9 10
xi
25 Mónica Marbán
FRECUENCY DISTRIBUTIONS
[Li-1-Li) x'i ci ni fi Ni Fi di
160-165 162,5 5 3 0,15 3 0,15 0,60
165-172 168,5 7 4 0,20 7 0,35 0,57
172-180 176 8 6 0,30 13 0,65 0,75
180-184 182 4 4 0,20 17 0,85 1,00
184-193 189,5 9 3 0,15 20 1 0,33
TOTAL 20 1
26 Mónica Marbán
FRECUENCY DISTRIBUTIONS
Its (approximate) histogram being: graphs with classes in X axis and rectangles
over them. The area of the bar is proportional to the corresponding
frecuency (absolute, relative or density)
1,0
0,9
Density frequency di
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0,0
160-165 165-172 172-180 180-184 184-193
27 Mónica Marbán
FRECUENCY DISTRIBUTIONS
Another example:
Intervalos ni
0-2 6
2-4 10
4-6 30 40
6-8 35 35
8-10 10
30
10-12 6
Total 97 25
ni 20
15
10
5
0
0 2 4 6 8 10 12 14
xi
28 Mónica Marbán