Chapter 1
Chapter 1
Chapter one
CHAPTER 1
1. Basic Concepts, methods of data collection and presentation
1.1 INTRODUCTION
Definition and classifications of statistics
Definition:
We can define statistics in two ways.
1. Plural sense (lay man definition).
It is an aggregate or collection of numerical facts.
2. Singular sense (formal definition)
Statistics is defined as the science of collecting,
organizing, presenting, analyzing and interpreting
numerical data for the purpose of assisting in making a
more effective decision.
Classifications:
Depending on how data can be used statistics is sometimes
divided in to two main areas or branches.
1. Descriptive Statistics: is concerned with summary
calculations, graphs, charts and tables.
2. Inferential Statistics: is a method used to generalize
from a sample to a population. For example, the average
income of all families (the population) in Ethiopia can be
estimated from figures obtained from a few hundred (the
sample) families.
It is important because statistical data usually arises
from sample.
Statistical techniques based on probability theory are
required.
Page 1 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Page 2 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Page 3 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Scales of measurement
Proper knowledge about the nature and type of data to be
dealt with is essential in order to specify and apply the
proper statistical method for their analysis and inferences.
Measurement scale refers to the property of value assigned
to the data based on the properties of order, distance and
fixed zero.
Page 4 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Order
Distance
Page 5 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Fixed Zero
SCALE TYPES
Nominal Scales
Page 6 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Examples:
o Country code
Ordinal Scales
Examples:
Page 7 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
o Military status.
Interval Scales
Examples:
o IQ
o Temperature in oF.
Ratio Scales
Examples:
o Weight
o Height
o Number of students
Page 8 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
o Age
Page 9 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Page 10 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Having collected and edited the data, the next important step is to organize it. That is to
present it in a readily comprehensible condensed form that aids in order to draw
inferences from it. It is also necessary that the like be separated from the unlike ones.
Tabular presentation
Diagrammatic and Graphic presentation.
Classification is a preliminary and it prepares the ground for proper presentation of data.
Definitions:
Frequency distribution: is the organization of raw data in table form using classes
and frequencies.
Used for data that can be place in specific categories such as nominal, or ordinal. E.g.
Marital status
Example: A social worker collected the following data on marital status for 25
persons. (M=married, S=single, W=widowed, D=divorced)
Page 11 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital
status M, S, D, and W. These types will be used as class for the distribution. We follow
procedure to construct the frequency distribution.
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Percentages are not normally a part of frequency distribution but they can be added since
they are used in certain types diagrammatic such as pie charts.
S //// // 7 28
Page 12 of 23
D //// // 7 28
W //// 6 24
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
-Is a table of all the potential raw score values that could possible occur in the data along
with the number of times each actually occurred.
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75
76 70 70 80 85
Page 13 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
80 /// 3
85 /// 3
90 / 1
-When the range of the data is large, the data must be grouped in to classes that are more than
one unit in width.
Definitions:
Units of measurement (U): the distance between two possible consecutive measures.
It is usually taken as 1, 0.1, 0.01, 0.001, -----.
Class width: the difference between the upper and lower class boundaries of any
class. It is also the difference between the lower limits of any two consecutive classes
or the difference between any two consecutive class marks.
Class mark (Mid points): it is the average of the lower and upper class limits or the
average of upper and lower class boundary.
Cumulative frequency above: it is the total frequency of all values greater than or
equal to the lower class boundary of a given class.
Cumulative frequency blow: it is the total frequency of all values less than or equal
to the upper class boundary of a given class.
Page 14 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Cumulative Frequency Distribution (CFD): it is the tabular arrangement of class
interval together with their corresponding cumulative frequencies. It can be more than
or less than type, depending on the type of cumulative frequency used.
Page 15 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Example*:
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11
11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Page 16 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Step 7: Find the class boundaries;
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 9: Write the numeric values for the tallies in the frequency column.
Page 17 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Class Class boundary Class Tally Freq. Cf (less Cf (more rf. rcf (less
limit Mark than than type) than type
type)
These are techniques for presenting data in visual displays using geometric and pictures.
Importance:
-The three most commonly used diagrammatic presentation for discrete as well as qualitative
data are:
Pie charts
pictogram
Bar charts
Pie chart
Page 18 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Solutions:
Step 3: Using a protractor and compass, graph each section and write its name corresponding
percentage.
Men 2500 25 90
Women 2000 20 72
Boys 1500 15 54
Page 19 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
CLASS
Boys Men
Girls Women
Pictogram
-In these diagram, we represent data by means of some picture symbols. We decide about
a suitable picture to represent a definite number of units in which the variable is
measured.
Bar Charts:
- A set of bars (thick lines or narrow rectangles) representing some magnitude over
time space.
- They are useful for comparing aggregate over time space.
- Bars can be drawn either vertically or horizontally.
- There are different types of bar charts. The most common being :
Simple bar chart
Deviation o0r two way bar chart
Broken bar chart
Component or sub divided bar chart.
Multiple bar charts.
Page 20 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
C 24 35 54
Solutions:
30
25
Sales in $
20
15
10
5
0
A B C
product
100
80
Sales in $
Product C
60
Product B
40
Product A
20
0
1957 1958 1959
Year of production
Page 21 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
- These are used to display data on more than one variable.
- They are used for comparing different variables at the same time.
Example:
Draw a component bar chart to represent the sales by product from 1957 to 1959.
Solutions:
60
50
Sales in $
40 Product A
30 Product B
20 Product C
10
0
1957 1958 1959
Year of production
Histogram
A graph which displays the data by using vertical bars of various heights to represent frequencies.
Class boundaries are placed along the horizontal axes. Class marks and class limits are sometimes
used as quantity on the X axes.
Page 22 of 23
Lecture notes Probability and Statistics (Stat 2071)
Chapter one
Example: Draw a frequency polygon for the above data (example *).
Solutions:
8
4
Value Frequency
0
2.5 8.5 14.5 20.5 26.5 32.5 38.5 44.5
Page 23 of 23