0% found this document useful (0 votes)
11 views

Business Statistics CH (6)

The document discusses data collection and presentation, detailing two types of data: qualitative and quantitative. It explains the differences between primary and secondary data, their advantages and disadvantages, and methods for collecting primary data. Additionally, it covers frequency distributions, including ungrouped and grouped types, and various ways to present data, such as histograms and cumulative frequency curves.

Uploaded by

Teferi Geta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Business Statistics CH (6)

The document discusses data collection and presentation, detailing two types of data: qualitative and quantitative. It explains the differences between primary and secondary data, their advantages and disadvantages, and methods for collecting primary data. Additionally, it covers frequency distributions, including ungrouped and grouped types, and various ways to present data, such as histograms and cumulative frequency curves.

Uploaded by

Teferi Geta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

UNIT TWO

DATA COLLECTION
&
PRESENTATION
Types of Data
Data sets can consist of two types of data:
Qualitative data and Quantitative data.

DATA

Qualitative Data Quantitative Data

Consists of Consists of numerical


attributes, labels, or measurements or
nonnumeric entries. counts.
Data Collection
▪ Is a systematic and meaningful assembly of
information for the accomplishment of the
objective of a statistical investigation.

▪ It refers to the methods used in gathering the


required information from the units under
investigation.
PRIMARY AND SECONDARY DATA
PRIMARY DATA/ SOURCES

▪ A primary source is a source from where first-hand information is


gathered.

▪ Are original sources of data.

SECONDARY DATA

▪ Is the one that makes data available, which were collected by some other
agency.

▪ A source, which is not primary, is necessarily a secondary source.

▪ Obtained from such sources as census and survey reports, books, official
records, reported experimental results, previous research papers, bulletins,
magazines, newspapers, web sites, and other publications.
EXAMPLE

➢ A study conducted to see the age distribution of


HIV/AIDS victim citizens.

▪ Information obtained from the victim citizens are


primary sources.

▪ Use of records of hospitals and other related agencies


to obtain the age of the victim citizens without the need
of tracing the victims personally is a secondary
source.
Advantages and Disadvantages of Primary & Secondary data
Advantages of primary data over that of secondary data.
▪ Gives more reliable, accurate and adequate information,
▪ Shows data in greater detail.
▪ Free from errors that may arise from copying of figures
from publications, which is the case in secondary data.
DISADVANTAGES OF PRIMARY DATA
▪ It is time consuming and costly.
▪ Gives misleading information due to lack of integrity of
investigators and non-cooperation of respondents.
DISADVANTAGES OF PRIMARY DATA
▪ It is time consuming and costly.
▪ Gives misleading information due to lack of integrity
of investigators and non-cooperation of respondents.
ADVANTAGE OF SECONDARY DATA:
• It is readily available and hence convenient and much
quicker
• It reduces time, cost and effort as compared to
primary data.
• May be available in subjects (cases) where it is
impossible to collect primary data. Such a case can be
regions where there is war.
ADVANTAGE OF SECONDARY DATA:

• It is readily available and hence convenient and much quicker

• It reduces time, cost and effort as compared to primary data.

• May be available in subjects (cases) where it is impossible to


collect primary data.

The disadvantages of Secondary data :

▪ Data obtained may not be sufficiently accurate.

▪ Data that exactly suit our purpose may not be found.

▪ Error may be made while copying figures.


Methods of collecting primary data
1. Personal Enquiry Method (Interview method)

A. Direct Personal Interview: There is a face-to-face contact

with the persons from whom the information is to be


obtained.

B. Indirect Personal Enquiry (Interview): The investigator


contacts third parties called witnessed who are capable of
supplying the necessary information.

2. Direct Observation

3. Questionnaire method
FREQUENCY DISTRIBUTION
▪Frequency refers to the number of observations a
certain value occurred in a data.
▪A frequency distribution is the organization of
raw data in table form, using classes and
frequencies.
▪The tabular representation of values of a variable
together with the corresponding frequency is called
a Frequency Distribution (FD).
A.Ungrouped Frequency Distribution (UFD)
▪Shows a distribution where the values of a variable are linked
with the respective frequencies.
▪Example: Consider the number of children in 15 families

No. of Children No. of Family Frequency


(Values) (Tallies)

0 // 2

1 //// 4

2 //// 4

3 /// 3

4 // 2

Total 15
B. Grouped Frequency Distribution (GFD)
▪If the mass of the data is very large, it is necessary to condense the
data in to an appropriate number of classes or groups of values of a
variable and indicate the number of observed values that fall in to each
class.
▪A GFD is a frequency distribution where values of a variable are
linked in to groups & corresponded with the number of observations in
each group.

Values (xi)
1 - 25 26 - 50 51 - 75 76 - 100

Frequency (fi) 3 10 18 6
COMMON TERMINOLOGIES IN A GFD
i. Class:- group of values of a variable between two specified
numbers called lower class limit (LCL) & upper-class limit (UCL)

Class limits (CL): It separates one class from another. The limits
could actually appear in the data and have gaps between the upper
limits of one class and the lower limit of the next class.
In Example*, the GFD contains four classes:
1 – 25, 26 – 50, 51 – 75, and 76 – 100
Class boundaries: Separate one class in a grouped frequency
distribution from the other. The boundary has one more decimal place
than the raw data.
•There is no gap between the upper boundaries of one class and the
lower boundaries of the succeeding class.
•Obtained by subtracting half of the unit of measurement (u) from the
lower limits and by adding ½ (u) on the upper limits of a class. U can
assume values 1, 0.1, 0.01, 0.001……
i.e UCBi = UCLi + ½ (u)
LCBi = LCLi - ½ (u)
Where UCBi = Upper Class Boundaries and
LCBi = Lower Class Boundaries
ii. Class Frequency (or Simply Frequency):
refers to the number of observations
corresponding to a class.
In Example * The class frequency of the 1st, 2nd,
3rd, & 4th classes are respectively 3, 10, 18 and 6.
Note: The unit of measurement (u) is the gap between any two
successive classes. i.e
u = lower limit of a class – upper limit of the preceding class.
In Example *, consider the 2nd class, 26 – 50, since u = 26 – 25 = 1,
LCL2 = 26 UCL2 = 50
LCB2 = 26 - ½(1) = 25.5 UCB2 = 50 + ½(1) =50.5

iv. Class Width (size of a class or class interval): it is the difference


between the upper- and lower-class limits or the difference between the
upper- and lower-class boundaries of any class.
Remarks:
1. If both the LCL & UCL are included in a class, it
is called an inclusive class. For inclusive classes,
Class width (cw) = UCBi - LCBi
2. If LCL is included and the UCL is not included
in a class, it is called an exclusive class. For
exclusive classes;
Class width (cw) = UCLi – LCLi
To be consistent, we use inclusive classes.
V. Class Mark (cm): it is the mid point (center) of a class

Note:- the difference between any two successive


class marks is equal to the width of a class
Range (R) : is the difference between the largest
(L) and the smallest (S) values in a data
R=L–S
RULES FOR FORMING A GROUPED FREQUENCY
DISTRIBUTION

To construct a GFD the following points should be considered


1. The classes should be clearly defined. That is each
observation should fall in to one & only one class.
2. The number of classes neither should be too large nor too
small. Normally, 5 to 20 classes are recommended.
3. All the classes should be of the same width. An approximate
suitable class width can be obtained as:
Note that a suitable number of classes can be obtained by
using the formula
n  1 + 3.322 logN.
up/down to the nearest whole number, where N is the total
number of observations.
▪ Alternatively, n can also be determined by 2𝑛 ≥ 𝑁
formula
Where
n=Number of Classes
N=Total number of observations
4. Determine the class limits

▪ Determine the upper-class limit of the first class (LCL1), then

• LCL2 = LCL1 + cw, LCL3 = LCL2 + cw,… LCLi+1 = LCLi + cw

▪ Determine the upper-class limit of the first class (UCL1) i.e.

UCL1 = LCL1 + cw – u,

▪ where u = the unit of measurement, then

UCL2 = UCL1 + cw , UCL3 UCL2, … , UCLi+1 = UCLi + cw

▪ Complete the GFD with the respective class frequencies.


• Example. The number of customers
for consecutive 30 days in a
supermarket was listed as follows:
20 48 65 25 48 49
35 25 72 42 22 58
53 42 23 57 65 37
18 65 37 16 39 42
49 68 69 63 29 67

A.Construct a GFD with a suitable number of


classes
B.Complete the distribution obtained in (A)
with class boundaries & class marks
Solution: i. Range = Largest value –
smallest value
= 72 – 16 = 56
N = 30 (total number of observations)
 number of classes, n = 1 + 3.322 log30
 n = 1 + 3.322 log30
= 1 + 3.322 (1.4771)
= 5.9
• Hence a suitable number of class n
is chosen to be 6
▪ Class width = 9.33 = cw

▪ For the sake of convenience, take cw to be 10 (note that it is also possible


to choose the cw to be 9).

• Take lower limit of the 1st class (LCL1) to be 16 & u = 1

• i.e. LCL1 = 16 and UCL1 = LCL1 + cw – u =16+10-1 = 25

LCL2 = LCL1 + cw = 16 + 10 = 26 UCL2 = UCL1 + cw = 25 +


10 = 35

LCL3 = LCL2 + cw = 26 + 10 = 36 UCL3 = UCL2 + cw = 35 +


10 = 45

• Therefore, the GFD would be


A) Class (xi) Frequency (fi)
16 – 25 7
26 – 35 2
36 – 45 6
46 – 55 5
56 – 65 6
66 – 75 4

Class (xi) Frequency (fi) CBi cmi


B)
16 – 25 7 15.5 – 25.5 20.5

26 – 35 2 25.5 – 35.5 30.5

36 – 45 6 35.5 – 45.5 40.5

46 – 55 5 45.5 – 55.5 50.5

56 – 65 6 55.5 – 65.5 60.5

66 – 75 4 65.5 – 75.5 70.5


CUMULATIVE FREQUENCY DISTRIBUTION (CFD)
▪ Cumulative frequency (CF): It is the number of
observation less than the lower-class boundary or greater
than the lower-class boundary of class.
▪ ‘Less Than’ Cumulative Frequency Distribution (<CFD):
it is the number of values less than the upper-class boundary
of a given class.
▪ ‘More Than’ Cumulative Frequency Distribution
(>CFD): it is the number of values greater than the lower-
class boundary of a given class.
Example : Consider the frequency distribution
given below
Class (xi) Frequency (fi) Less than More than
Cumulative Cumulative
Frequency (<cfi) Frequency (>cfi)
3–6 4 4 30
7 – 10 7 11 26
11 – 14 10 21 19
15 – 18 6 27 9
19 – 22 3 30 3

This means that from ‘less than’ cumulative


frequency distribution there are 4 observations
less than 6.5, 11 observations below 10.5, etc and
from ‘more than’ cumulative frequency
distribution 30 observations are above 2.5, 26
above 6.5 etc.
PRESENTATION OF DATA
• Presentation is a statistical procedure of arranging and putting data
in a form of tables, graphs, charts and/or diagrams.

HISTOGRAM

• Consisting of a series of adjacent rectangles whose bases are equal


to the class width of the corresponding classes and whose heights
are proportional to the corresponding class frequencies.

• The class boundaries are marked along the x – axis and the class
frequencies along the y – axis.

• It describes the shape (symmetry) of the data and where do most of


the data values lie?
• Example : A histogram to representing the following data.

Class limits 15-24 25-34 35-44 45-54 55-64 65-74 75-84


Frequency 3 4 10 15 12 4 2

Histogram
20

15
15
12
10
Frequency

10

4 4
5 3
2

Class width
FREQUENCY POLYGON
• It is a line graph of frequency distribution.

• Clearly illustrates shape of the data than a


histogram does.

• Connects the centers (class marks) of the


tops of the histogram bars with a series of
straight lines.
Frequency Polygon
16

14

12

F
r 10
e
q
u 8
e
n
c 6
y

0
9.5 19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5

Class mark
CUMULATIVE FREQUENCY CURVE, (OGIVE)

• It is useful for determining the number of


values below or above some particular value.

• Uses class boundaries along the horizontal


axis and frequencies along the vertical axis.

• There are two type of O-give namely less


than Ogive and more than Ogive.
CUMULATIVE FREQUENCY CURVE, (OGIVE)

Cumulative Frequency
The Less than Ogive The More than Ogive

60 60
50
Cumulative

50
Frequency

40 40
30 30
20 20
10 10
0 0

14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5 14.5 24.5 34.5 44.5 54.5 64.5 74.5 84.5

Class Boundaries Class Boundaries


LINE GRAPH
Example . Draw a line graph for the following time
series.
Year 1986 1987 1988 1989 1991
Values 20 10 30 15 1

A line graph showing the above time series

35
30 30
25 25
Values

20 20
15 15
10 10 10
5
0
1986 1987 1988 1989 1990 1991

Year
VERTICAL LINE GRAPH
• Is a graphical representation of discrete data and
frequencies.
• Vertical solid lines are used to indicate the
frequencies.
• Example . Draw a vertical line graph for the
following data Family A B C D E
Number of children 2 1 5 4 3
BAR CHART (BAR DIAGRAM)
• Histogram, Frequency polygon, ogives are
used for data having an interval or ratio
level of measurement.
• Bar chart is a series of equally spaced bars
of uniform width where the height (length)
of a bar represents the frequency
corresponding with a category.
• Bars may be drawn horizontally or
vertically. Vertical bar graphs are
preferred as they allow comparison with
other bars.
• Example: Revenue (in millions of Birr) of
company x from 1980 to 1982 is given
below
Year Revenue Year Maize Wheat
1980 50 1980 40 80
1981 150 1981 20 60
1982 200 1982 60 100

The number of quintals(in


A simple bar chart showing
revenues of company X from thousands) of wheat and maize
1980 to 1982 production
250
100
100
200 80
80
60 60
Revenue

150 Number of 60
quintals 40
40 maize
100 20
20 wheat
50 0
1980 1981 1982
0
1980 1981 1982 Year

year
SUBDIVIDED BAR CHART Example : percentage bar chart
Year Wheat Maize
Year % of Wheat Production % of Maize
1980 150 150
Production
1981 300 200 1980 150/300  100 = 50 150/300  100 = 50

1982 350 100 1981 300/500  100 = 60 200/500  100 = 40

The number of quintals of wheat and 1982 350/450  100 = 78 100/450  100 = 22

maize produced by country X Percentage of wheat and maize production from 1980-1982

Percentage produced
100%
600 90% 22
80%
Number of quintals

40
500 50
70%
400 200 100 60% wheat
Maize
50%
300 40% 78 maize
200 150 Wheat 30% 60
350 50
300 20%
100 10%
150
0 0%
1980 1981 1982 1980 1981 1982

Year
Year
PIE CHART
• A pie chart is a circle that is divided in to sections or
according to the percentage of frequencies in each
category of the distribution.
• Example: The monthly expenditure of a certain family is
given below.

Items Expenditure % Proportion (Pfi) Degrees (360o Rfi)

Clothing 100 100/1000  100 = 10 100/1000  360o = 36

Food 350 350/1000  100 = 35 350/1000  360o = 126

House Rent 250 250/1000  100 = 25 250/1000  360o = 90

Miscellaneous 300 300/1000  100 = 30 300/1000  360o = 108

Total 1000 100% 360o


Solution: The pie chart for the above expenditure is
as follows

Food
300
350
House rent

Clothing

100 Misc.

250
PICTOGRAPH (PICTOGRAM)
• A pictograph is a graph that uses symbols or pictures
to represent data.
• Example : In comparing the population of a country
from 1990 to 1992, we simply draw pictures of people
where each picture may represent 1000,000 people.

1992 -  Key:  = 1,000,000


1991 - 
1990 - 

You might also like