Chapter 1-2 Basic Stat.docx NEW (1) (1)
Chapter 1-2 Basic Stat.docx NEW (1) (1)
1 INTRODUCTION
Statistics is the sciences of conducting studies to collect, organize, summarize, analyze, and draw
conclusions from data. Its meaning can be categories into two entirely different categories.
These are plural sense and singular sense.
Plural sense (statistical data): statistics is defined as aggregates of numerically expressed facts
or figures collected in a systematic manner for a pre-determined purpose.
Singular sense (statistical methods): statistics is defined as the science of collecting organizing,
presenting, analyzing and interpreting numerical data to make good decision on the basis of such
analysis.
Classification of statistics:
There are two broad classification of statistics. These are:
Descriptive statistics consists of the collection, organization, summarization, and presentation of
data.
In descriptive statistics the statistician tries to describe a situation.
Inferential statistics consists of generalizing from samples to populations, performing estimations
and hypothesis tests, determining relationships among variables, and making predictions. For
example, the average income of all families (the population) in Ethiopia can be estimated from
figures obtained from a few hundred (the sample) families.
In singular sense statistics is defined as the process data collection, data organization
(classification), data presentation, data analysis, and data interpretation. So we consider the
following stages of statistical investigation.
Data Collection: This is a stage where we gather information for our purpose. This can be done
by interviewing, questionnaire and observations.
Data Organization: It is a stage where we edit our data. The collected data may have irrelevant
figures, incorrect facts, omission and mistakes.
1
Data Presentation: The organized data can now be presented in the form of tables and diagram.
At this stage large data will be presented in a summarized and condensed manner. Graph, tables
and Diagrams may be used to make the presentation attractive
Data Analysis: This is the stage where we critically study the data to draw conclusions about the
population parameter. The purpose of data analysis is to dig out information useful for decision
making.
Data Interpretation: This is the stage where draw valid conclusions from the results obtained
through data analysis. Interpretation means drawing conclusions from the data which form the
basis for decision making.
Population: is the totality of all individuals, objects or items under consideration. Example all of
the students in AKU.
Sample: is a group of subjects selected from a population to draw conclusions about the
population.
Sampling: -The process of selecting a sample from the population is called sampling.
Sample survey: The technique of collecting information from a portion of the population.
Census survey: A survey that includes every member of the population.
Parameter: It is a descriptive measure (value) computed from the population. Eg. mean
population, Population standard deviation etc.
Statistic: It is a measure used to describe the sample. It is a value computed from the sample.
Data: Data as a collection of related facts and figures from which conclusions can be drawn.
Variable: A certain characteristic which changes from object to object and time to time.
Qualitative variable are variables that can be classified into two or more non numerical
categories. They are non-numeric variable and can’t be measured. Examples sex (M or F), blood
type, marital status, religion e t c.
2
Quantitative variables are numerical and can be ordered or ranked. For example, the variable
age, heights, weights, and body temperatures. Number of students in a class.
Quantitative variables can be further classified into two groups: discrete and continuous.
Discrete variables: can be assigned values such as 0, 1, 2, 3 and are said to be countable.
Examples: number of children in a family, the number of students in a classroom, and the number
of calls received by a switchboard operator each day for a month.
Continuous variables: can assume an infinite number of values between any two specific values.
They are obtained by measuring. For example, Temperature is a continuous variable, since the
variable can assume an infinite number of values between any two given temperatures, Age,
height, weight etc.…are continuous variables.
Application of statistics:
Research works.
Proving an important tool to the management of cost budgetary.
Estimating quality standards for industrial products
To determine the probability of reliability of a product and other many more areas with
respect to research.
3
Statistics doesn’t deal with single (individual) values.
Statistics can’t deal with qualitative characteristics: It only deals with data which can be
quantified. Example, it does not deal with marital status (married, single, divorced, widowed)
but it deal with number of married, number of single, number of divorced.
Statistical conclusions are true in majority case: The conclusions drawn from the analysis
of the sample may, perhaps, differ from the conclusions that would be drawn from the entire
population. For this reason, statistics is not an exact science.
Statistical interpretations requires a high degree of skill and understanding of the
subject. Besides, honesty is very important in the use of statistics.
Example: From the 1985E.C. graduates of accounting at MBC more than 80 percent of the females
graduated with the GPA above 2.50. Therefore females are better in Accounting than any other
field. Here the given information is not sufficient to make the conclusion stated because
1) It is a data taken from 1985EC only and does not also include the performance of females in the
other departments.
2) It does not tell the female to male proportion, where the fact may be there were only two female
students in the Accounting department who graduated that year and all of them graduated with a
GPA above 2.50.
1.6 Scales of Measurement: Based on the scale of measurement data can be divided in to four.
Nominal Scale: Level of measurement which classifies data into mutually exclusive, all-inclusive
categories in which no order or ranking can be imposed on the data. Example, sex of an individual
may be male or female. There is no natural ordering of the two sexes. Others examples include
religion, blood type, eye color, marital status e.t.c.
Ordinal Scale: The ordinal level of measurement classifies data into categories that can be
ranked; however, precise differences between the ranks do not exist.
4
Interval Scale: classifies data into categories that can be ranked or ordered, the distance or
magnitude between two values is clearly known (meaningful). However, there is no a true zero
point. Possible to add or subtract interval data but they may not be multiplied or divided.
Example: Temperature of zero degrees does not indicate lack of heat. Hence zero is arbitrary point
in the scale.
Ratio Scale: The ratio level of measurement possesses all the characteristics of interval
measurement, and there exists a true zero. In addition, true ratios exist when the same variable is
measured on two different members of the population.
Example: Variables such as age, height, length, volume, rate, time, amount of rainfall, etc. are
require ratio scale.
CHAPTER TWO
The method of data collection is depend on the sources of data. Data according to sources classified
as
i. primary data
ii. Secondary data
Each source of data has its own method of collection.
1. Primary data: is a first-hand information which is collect by the investigator or the user directly
from the source.
2. Secondary data: when an investigator uses the data which has already been collected by others,
such data are called secondary data.
Primary methods of data collection: Those methods that aim at collecting primary data are
termed as primary method. These methods are direct personal interview or observation, indirect
personal interview or observation, Mailed Questionnaires, Schedules through enumerators.
5
Secondary method of data collection: Secondary data can be obtained from published or
unpublished documents: reports, journals, magazines, articles e t c.
The organized data can now be presented in the form of tables, graph and diagram. At this stage,
large data will be presented in tables in a very summarized and condensed manner. The main
purpose of data presentation is to facilitate statistical analysis.
Frequency: - is the number of times a certain value or set of values occurs in a specific group or
the number of values in a specific class of the distribution.
A frequency distribution: is the organization of raw data in table form using classes and
frequencies.
According to variables there are two types of frequency distributions. These are
I. Categorical (qualitative) and
II. Numerical (quantitative) frequency distributions
Categorical frequency distributions: Is a frequency distribution in which data are classified and
presented according to qualitative or non-numerical categories.
Example: Construct frequency distribution for the marital status of 20 adults classified as
Single (S), married (M), divorced (D) and widowed (W) of the data.
S,M,W,D,D,M,W,S,D,M,D,M,D,M,S,W,M,S,D,D.
Solution: In this case marital status is the variable that is categorized into single(S), married (M),
divorced (D) and widowed (W), so the frequency distribution is
Marital status No of Adults
Single 4
Married 6
6
Divorced 7
widowed 3
Total 20
Definitions:
Raw data: data collected in original form.
Example: A demographer is interested in the number of children a family may have, took sample
of 30 families and obtained the following observations.
Number of children in a sample of 30 families
424328
344228
534545
435273
367384
Construct a frequency distribution for this data.
Solution: These individual observations can be arranged in ascending or descending order of
magnitude in which case the series is called array.
Array of the number of children in 30 families
7
2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 7, 7, 8, 8, 8,since the variable ”
number of children in a family” can assume only the values 0,1,2,3,……etc. hence it is a discrete
variable. Therefore its frequency distribution is a discrete frequency distribution
Frequency distribution of children in a 30 families
No of children 2 3 4 5 6 7 8 Total
No of family 5 7 8 4 1 2 3 30
Grouped Frequency Distribution: a frequency distribution when several numbers are grouped
in one class. It is also known as Continuous Frequency distribution. Continuous variable like
height, weight, income etc., so for such kind of variable we construct Continuous Frequency
distribution.
Class limits: Separates one class from another. The limits could actually appear in the data
and have gaps between the upper limits of one class and lower limit of the next.
Class boundaries: Separates one class in a grouped frequency distribution from another...
The lower class boundary is found by subtracting U/2 from the corresponding lower class
limit and the upper class boundary is found by adding U/2 to the corresponding upper class
limit. There is no gap between the upper boundary of one class and lower boundary of the
next class.
Class width: the difference between the upper and lower class boundaries of any class. It
is also the difference between the lower limits of any two consecutive classes or the
difference between any two consecutive class marks.
Class mark (Mid points): it is the average of the lower and upper class limits or the
average of upper and lower class boundary.
Class Frequency: The number of observations belonging to a particular class is known as
the frequency of that class or class frequency.
Cumulative frequency: is the number of observations less than/more than or equal to a
specific value.
8
Cumulative frequency more than: cumulative frequency distribution of a class is
obtained by adding the frequency of the succeeding classes including the frequency of that
class.
Cumulative frequency less than: cumulative frequency distribution of a class is obtained
by adding the frequency of the preceding classes including the frequency of that class.
1. Find the maximum (Max) and the minimum (Min) observation, and then compute their range,
R ,(R= Maximum – Minimum)
2. Select the number of classes desired, usually between 5 and 20 or use
Sturges rule k=1+3.32logn where k is number of classes desired and n is total number of
observation.
3. Find the class width by dividing the range by the number of classes and rounding up, not off.
𝑅
𝑊=𝐾
4. Pick a suitable starting point less than or equal to the minimum value.
The starting point is called the lower limit of the first class. Continue to add the class width to
this lower limit to get the rest of the lower limits.
5. To find the upper limit of the first class, subtract U (one unit of measurement) from the lower
limit of the second class. Then continue to add the class width to this upper limit to find the rest
of the upper limits.
6. Compute the class boundaries as: LCB = LCL U and UCB = LCL + U Where LCL = lower
class limit, UCL= upper class limit, LCB= lower class boundary and UCB= upper class
9
boundary. The class boundaries are also half way between the upper limit of one class and the
lower limit of the next class. !may not be necessary to find the boundaries.
11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6, then the range R=H-L=39-6=33
Step 2: Select the number of classes’ desired using Sturges formula; k =1+ 3.32logn =1+3.32log
(20) =5.32=6(rounding up)
Step 3: Find the class width; w=R/k=33/6=5.5=6 (rounding up)
Step 4: Select the starting point, let it be the minimum observation.
5, 12, 18, 24, 30, 36 are the lower class limits.
Step 5: Find the upper class limit; e.g. the first upper class=12-U=12-1=11
11, 17, 23, 29, 35, 41 are the upper class limits.
Step 6: Find the class boundaries;
E.g. for class 1 Lower class boundary=6-U/2=5.5
Upper class boundary =11+U/2=11.5, Then continue adding ‘w’ on both boundaries to obtain the
rest boundaries
Step 7: Find the frequencies
Step 8: Find cumulative frequency.
The complete frequency distribution follows:
Class Class Class Freq. Cf (less
Limit Boundary Mark than
type)
10
18 – 23 17.5 – 23.5 20.5 7 11
Exercise 2.1: The following data are on the number of minutes to travel from home to work for a
group of automobile workers. 28 25 48 37 41 19 32 26 16 23 23 29 36 31 26 21 32 25 31 43 35 42
38 33 28. Construct a frequency distribution for this data.
These are techniques for presenting data in visual displays using geometric and pictures.
Importance:
They have greater attraction.
They facilitate comparison.
They are easily understandable.
Diagrams are appropriate for presenting discrete data.
-The two most commonly used diagrammatic presentation for discrete as well as qualitative data
are:
Pie charts
Bar charts
Pie chart
A pie chart is a circle that is divided in to sections or wedges according to the percentage of
frequencies in each category of the distribution. The angle of the sector is obtained using:
𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑎𝑟𝑡
Angle of sector= 𝑡ℎ𝑒 𝑤ℎ𝑜𝑙𝑒 𝑞𝑢𝑎𝑛𝑡𝑖𝑡𝑦*3600
11
2500 2000 4000 1500 10000
Solutions:
Step 3: Using a protractor and compass, graph each section and write its name corresponding
percentage.
Bar Charts:
Set of bars (thick lines or narrow rectangles) representing some magnitude over time space.
- There are different types of bar charts. The most commons are :
Simple bar chart
Component or sub divided bar chart.
Multiple bar charts.
12
Example: The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.
Product Sales($) Sales($) Sales($)
In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54
Solutions:
-When there is a desire to show how a total (or aggregate) is divided in to its component parts, we
use component bar chart.
-The bars represent total value of a variable with each total broken in to its component parts and
different colours or designs are used for identifications
Example:
Draw a component bar chart to represent the sales by product from 1957 to 1959.
Solutions
13
Multiple Bar charts
The histogram, frequency polygon and cumulative frequency graph or ogive are most commonly
applied graphical representation for continuous data.
Histogram: A graph which displays the data by using vertical bars of height to represent
frequencies. Class boundaries are placed along the horizontal axes. Class marks and class limits
are sometimes used as quantity on the X axes.
Example. Suppose we are given the following data to be displayed by means of a histogram.
Marks 0 _ 20 20 _ 40 40 _ 60 60 _ 80 80 _100
Number of 10 22 35 28 5
Students
14
Frequency polygon
A frequency polygon is a line graph drawn by taking the frequencies of the classes along the
vertical axis and their respective class marks along the horizontal axis. Then join the cross points
by a free hand curve. And also we can draw the frequency polygon on histogram by joining the
mid-points of the rectangles by straight lines.
15