0% found this document useful (0 votes)
3 views

statistics-notes-module-1 (introduction)

The course on Social Statistics aims to provide learners with essential statistical knowledge and skills applicable in nutrition and health. Students will learn to explain statistical concepts, collect and present data, apply measures of central tendency and dispersion, perform significance tests, and develop research questions. The course covers topics such as the definition and functions of statistics, data collection methods, and the classification of variables and measurement scales.

Uploaded by

Kelvin Muriithi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

statistics-notes-module-1 (introduction)

The course on Social Statistics aims to provide learners with essential statistical knowledge and skills applicable in nutrition and health. Students will learn to explain statistical concepts, collect and present data, apply measures of central tendency and dispersion, perform significance tests, and develop research questions. The course covers topics such as the definition and functions of statistics, data collection methods, and the classification of variables and measurement scales.

Uploaded by

Kelvin Muriithi
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 26

SOCIAL STATISTICS – MOD 1

COURSE PURPOSE
The purpose of this course is to equip you the learner with knowledge and skills in statistics for
application in nutrition and health.

EXPECTED LEARNING OUTCOMES


By the end of the course you the learner should be able to:
1. Explain concepts in statistics.
2. Collect, organize and present data in a scientific way.
3. Apply various measures of central tendency, dispersion, concepts of correlation
and regression.
4. Perform tests of significance.
5. Develop and explain research questions.

COURSE OUTLINE
1. Introduction to statistics
i. Meaning of statistics and social statistics
ii. Reasons for studying statistics
iii. General Functions of statistics
iv. General Limitations of statistics
v. Scales of measurement.
2. Collection, classifying, organization and presentation of data including construction of
frequency distribution table.
3. -Measures of central tendency: mean mode and median.
-Measures of dispersion: variance, standard deviation, range, coefficient of variance,
deciles, percentiles, quartiles, interquartile range and quartile deviation.
CHAPTER ONE

INTRODUCTION TO STATISTICS
TOPIC OBJECTIVES
At the end of the topic, you should be able to:
i. Define Social statistics and state the relationship with statistics.
ii. State some reasons for studying Biostatistics.
iii. Describe functions and limitations of statistics.
What is statistics?
Statistics is the science of conducting studies to 1) collect, 2) organize, 3) summarize, 4) analyse
and 5) draw conclusions from data.
Therefore what is social statistics?
It’s the use of statistical concepts to analyse and make conclusions about human social
environment.
Why should a student study social statistics?
Students are required to study statistics for several reasons, some of the reasons are outlined
below;
i. To be able to read and understand various statistical studies performed on human social
environment. E.g. to understand vocabulary, symbols and statistical concepts used in
statistical studies.
ii. To obtain ideas on how to design experiments, collect data, organise data, analyse
data, summarise data and make possible conclusions about the data.
iii. To use the knowledge gained to become better consumers and citizens such that you can
make informed decisions about what products to consume or buy based on different
statistical studies performed in the area of human social environment etc.

General functions of statistics


Statistics is used for various purposes or functions. It’s used to simplify mass data and to make
comparisons easier. It’s also used to bring out trends and tendencies in the data as well as the
hidden relations between variables. All this helps to make decision making much easier. Let us
look at each function of statistics in detail. This functions apply to social statistics
1. Statistics simplifies mass data –

The use of statistical concepts helps in simplification of complex data. Statistical methods help in
reducing the complexity of the data and consequently in the understanding of any huge mass of
data

2. Statistics makes comparison easier–

Without using statistical methods and concepts, collection of data and comparison can’t be done
easily. Statistics helps us to compare data collected from different sources. Grand totals,
measures of central tendency, measures of dispersion, graphs and diagrams, coefficient
of correlation all provide ample scopes for comparison.

3. Statistics brings out trends and tendencies in the data –

After data is collected, it’s easy to analyze the trend and tendencies in the data by using the
various concepts of Statistics

4. Statistics bring out the hidden relations between variables-

Statistical analysis helps in drawing inferences on data. Statistical analysis brings out the hidden
relations between variables.

5. Decision making power becomes easier with statistics–

With the proper application of Statistics and statistical software packages on the collected data,
managers can take effective decisions, which can increase the profit in a business

General limitations of statistics


Despite its power, essential usefulness and universal applicability, statistics has its own
limitation and imperfections, which include but not limited to:
1. Statistics does not study individuals.
2. Statistics does not study qualitative phenomena
3. Statistics data do not reveal the story
4. Statistical results are not always unquestionable
5. Statistical laws are true on the average or in the long run
6. Statistical result might leads to fallacious conclusions if they are quote without their
context or If they are manipulated
7. Statistical data are liable to be misused easily.

Each of these limitations is discussed in detail below:


1. Statistics does not study individuals.

Individuals facts and figures are of importance to individuals only statistics does not
deals with them .It deals with mass phenomena and therefore, throws light on the whole
of a given group. Statistical deals with aggregates, though for purpose of analysis these
aggregates are very often reduced to single figures like average and percentages.
2. Statistics does not study quantitative phenomena

The statically methods can be applied to the study of those problems only that are capable of
quantitative expression, i.e., it is not applicable to the study of those facts that are not
quantitatively measurable, for example attributes like honesty, healthy skills etc can’t be
measured with in figures.

3. Statistical data does not reveal the entire story


Statistical data collected relating to a problem under study do not reveal the entire story of the
problems. This is because many problems are affected by factors which are in capable of
statistical analysis. Therefore it. Is not always possible to examine a problem in all its
manifestations only by the statistics.

Quote “It is certainly important in statistical investigation not to forget that the data do not tell
everything, and in their summarized, reduced form, they leave out a lot of information that may
be importance “.

4. Statistical results are not always unquestionable


Statistical results are more in the nature of estimate and probabilities than exact statement. The
statistical methods are one of the several methods of studying problems. It helps us in studying
trends, in forming an idea of the probabilities, in knowing how a given phenomenon has been
behaving generally necessarily unquestionable and reliable.

5. Statistical laws are true on the average or in the long run.


That is, they are not true in their entirety-they are statements of long-term tendencies, and,
therefore, cannot be considered stables. Statistical laws of physical science are. This is because
statistics deals with phenomena which are affected by a multiplicity of causes. And it is not
possible to study statically the effects of each of those causes to factors separately, as it is
possible under the experimental methods. Because of this limitation, conclusion arrived at are
not perfectly accurate and consequently the same conclusions cannot be arrived at under similar
condition at all times.

6. Statistical results might lead to fallacious conclusions they are quoted without their
context or if they are manipulated
Statistical results are true only in a particular context, if the context is absent, we cannot be sure
of their validity. E.g. if the average profits earned by two firms are over the past few years are
quoted, and incidentally, m the averages happens to be identical quantities, the conclusions may
be drawn that both the firms are doing equally well. But if the actual profits earned by each of
the two firms during the period were examined, it may be found that the profits of one firms is
going up year by year while those of the other are going down by down.
7. Statistics are liable to be misused easily
Any person can use statistics and draw any type of conclusion he likes they may be used by
anyone to make a worse case appears to be a better case. Like medicines in the hands of quacks,
statistics can be easily misused by the inexpert, statistical data can be scientifically handled only
by those who have expert knowledge of statistical methods.

GENERAL REVISION QUESTIONS


1. State the difference between statistics and Biostatistics(2 marks)
2. State 2 reasons why Biostatistics is relevant to your course. (2 marks)
3. Describe 5 functions of statistics. (10 marks)
4. Describe 5 limitations of statistics (10 marks)
CHAPTER TWO

DATA COLLECTION AND


CLASSIFICATION
TOPIC OBJECTIVES
At the end of the topic, you should be able to:
i. Define the basic terms used in statistics
ii. Describe variables as used in statistics.
iii. Describe the different scales of measurement used in statistics.
iv. Explain the different methods of data collection.
v. Draw a frequency distribution table.
vi. Describe the different methods of data representation.
Definitions of terms that will be used in this chapter and other chapters
1. Population: is a collection of all subjects that are being studied
2. Sample: is group of subjects selected from a population
3. Data: are numerical measurements or observations that a variable can assume.
4. Data set: a collection of data values.
5. Datum or Data Values: each value in a data set.
6. Variables: are characteristics that assume different values e.g. age, height, weight, etc.
7. Statistic: a characteristic or measure obtained by using the data values from a sample
8. Parameter: a characteristic or measure obtained by using all the data values for a
specific population

The two types of data are:


 Primary Data - This data obtained from observations, experiments, interviews, etc.
 Secondary Data – This data obtained from published materials such as books,
newspapers, etc.

Variables can be classified as: qualitative or quantitative.


1. Qualitative variables are
Variables that can be placed into distinct categories, according to some characteristic or attribute. For
example, if subjects are classified according to gender (male or female), then the variable gender
is qualitative. Other examples of qualitative variables are religious preference and geographic
locations.
2. Quantitative variables are
Numerical and can be ordered or ranked. For example, the variable age is numerical, and people can
be ranked in order according to the value of their ages. Other examples of quantitative variables
are heights, weights, and body temperatures.
Quantitative variables can be further classified into two groups:
 Discrete variables and
 Continuous variables.

Discrete variables can be assigned values such as 0, 1, 2, 3 and are said to be countable.
Examples of discrete variables are the number of children in a family, the number of students in
a classroom, and the number of calls received by a switchboard operator each day for a month.

Therefore Discrete variables are variables that assume values that can be counted. Are often
whole numbers.
Continuous variables, by comparison, can assume an infinite number of values in an interval
between any two specific values. Temperature, for example, is a continuous variable, since the
variable can assume an infinite number of values between any two given temperatures.

Therefore Continuous variables are variables that assume an infinite number of values between
any two specific values. They are obtained by measuring. They often include fractions and
decimals.

In addition to being classified as qualitative or quantitative, variables can be classified by how


they are categorized, counted, or measured. For example, can the data be organized into specific
categories, such as area of residence (rural, suburban, or urban)? Can the data values be ranked,
such as first place, second place, etc.? Or are the values obtained from measurement, such as
heights, IQs, or temperature? This type of classification—i.e., how variables are categorized,
counted, or measured—uses measurement scales, and four common types of scales are used:

i. Nominal scale,
ii. Ordinal scale,
iii. Interval scale, and
iv. Ratio scale.

Measurement Scales can now be explained in detail:

i. Nominal scale of measurement.

The nominal scale of measurement classifies data into mutually exclusive


(Non over lapping) categories in which no order or ranking can be imposed on the data.

A sample of college lecturers classified according to subject taught (e.g., English, history,
psychology, or mathematics) is an example of nominal-level measurement. Classifying survey
subjects as male or female is another example of nominal-level measurement.
No ranking or order can be placed on the data. Classifying residents according to zip codes is
also an example of the nominal level of measurement. Even though numbers are assigned as zip
codes, there is no meaningful order or ranking. Other examples of nominal-level data are
classification according to political party (CORD, JUBILEE, Independent, etc.), religion
(Christianity, Hindu, Islam, Pagan etc.), and marital status (single, married, divorced, widowed,
separated etc.).
ii. Ordinal scale of measurement.

The ordinal level of measurement classifies data into categories that can be ranked;
However, precise differences between the ranks do not exist

Data measured at this level can be placed into categories, and these categories can be ordered, or
ranked. For example, from student evaluations, guest speakers might be ranked as superior,
average, or poor. Floats in a homecoming parade might be ranked as first place, second place,
etc.
Note that precise measurement of differences in the ordinal level of measurement does not exist.
For instance, when people are classified according to their build (small, medium, or large), a
large variation exists among the individuals in each class. Other examples of ordinal data are
letter grades (A, B, C, D, and F).
.
iii. Interval scale of measurement.

The interval level of measurement ranks data, and precise differences between units of
measure do exist; however, there is no meaningful zero.

This level differs from the ordinal level in that precise differences do exist between units. For
example, many standardized psychological tests yield values measured on an interval scale. IQ is
an example of such a variable. There is a meaningful difference of 1 point between an IQ of 109
and an IQ of 110. Temperature is another example of interval measurement, since there is a
meaningful difference of 1oF between each unit, such as 72 and 73oF. One property is lacking in
the interval scale: There is no true zero. For example, IQ tests do not measure people who have
no intelligence. For temperature, 0oF does not mean no heat at all.

iv. Ratio scale of measurement

The ratio level of measurement possesses all the characteristics of interval


Measurement, and there exists a true zero. In addition, true ratios exist when the same
Variable is measured on two different members of the population.

Examples of ratio scales are those used to measure height, weight, area, and number of phone
calls received. Ratio scales have differences between units (1 inch, 1 pound, etc.) and a true zero.
In addition, the ratio scale contains a true ratio between values. For example, if one person can
lift200 pounds and another can lift 100 pounds, then the ratio between them is 2 to 1. Put another
way, the first person can lift twice as much as the second person.
9. Errors
There are three types of errors:
i. Gross errors,
ii. Systematic errors (determinate error) and
iii. Random errors (indeterminate errors)
Gross errors are errors that lead one to abandon the process and start again. They make results
to be extreme i.e. too small or large. Outliers are values that have got gross errors.
Outliers: an extreme value in a data set. E.g. Its either very small or very large
compared to the other values in the data.
Systematic errors are errors that have definite values and assignable cause. They are errors that
can be eliminated and affect accuracy of results. They are caused by personal errors, method
errors and instrumental errors. They can be eliminated by weighing by difference and using
calibration procedures.
Random errors are errors that are unavoidable and can’t be eliminated quickly. They are
present in every physical measurement due to uncertainty. They lead to values being on both
sides of the mean. They affect precision of results.
10. Precision
It’s the reproducibility of results i.e. ability to get the same results. They give guidelines to
whether the results are accurate or not and the lower the standard deviation, the higher the
precision.
11. Accuracy
It’s the nearness of result or arithmetic mean to the true value or standard value. In most cases its
compared to true value where the lower the error, the higher the accuracy.
12. Statistical Data Array
It’s the arrangement of raw data in either ascending or descending order.
Example
7, 10, 5, 3, 2, 9, 8, 7, 8, 7, 4, 5
Ascending order will be: 2,3,4,5,7,7,7,8,8,9,10
DATA COLLECTION

Data can be collected using several methods some of which are outlined and explained in detail
below.
i. Surveying
ii. Sampling
iii. Surveying records
iv. Direct observation etc.
1. Surveying
Surveying can be done in a variety of ways:
 Telephone survey,
 Mailed questionnaire, and
 Personal interview.

Some details about surveying;

i. Telephone surveys
Have an advantage over personal interview surveys in that they are less costly. Also, people may be
more candid in their opinions since there is no face to-face contact. A major drawback to the
telephone survey is that some people in the population will not have phones or will not answer
when the calls are made; hence, not all people have a chance of being surveyed. Also, many
people now have unlisted numbers and cell phones, so they cannot be surveyed. Finally, even the
tone of the voice of the interviewer might influence the response of the person who is being
interviewed.

ii. Mailed questionnaire surveys


Can be used to cover a wider geographic area than telephone surveys or personal interviews since
mailed questionnaire surveys are less expensive to conduct. Also, respondents can remain
anonymous if they desire. Disadvantages of mailed questionnaire surveys include a low number
of responses and inappropriate answers to questions. Another drawback is that some people may
have difficulty reading or understanding the questions.
iii. Personal interview surveys
Have the advantage of obtaining in-depth responses to questions from the person being
interviewed. One disadvantage is that interviewers must be trained in asking questions and
recording responses, which makes the personal interview survey more costly than the other two
survey methods. Another disadvantage is that the interviewer may be biased in his or her
selection of respondents.

2. Sampling
Sample
A sample is “a smaller (but hopefully representative) collection of units from a population used to
determine truths about that population” (Field, 2005)
Population
Is a complete set of elements (persons or objects) that possess some common characteristic defined
by the sampling criteria established by the researcher.

There are two types of population:


Target population
The entire group of people or objects to which the researcher wishes to generalize the study
findings and meets set of criteria of interest to researcher.eg All institutionalized elderly with
Alzheimer's, All people with AIDS, All low birth weight infants etc
Accessible population
the portion of the population to which the researcher has reasonable access; may be a subset of
the target population and may be limited to region, state, city, county, or institution e.g. all
pregnant teens in the County of Kiambu, all people with AIDS in Thika District
Reasons why sampling is done
 The total cost of a sample will be much less than that of the whole lot.
 With smaller number of observations it is possible to provide results much faster
as compared to the total number of observations.
 Sampling has a greater scope regarding the variety of information by virtue of
its flexibility and adaptability.
 Sampling has actual appraisal of reliability

Limitations of sampling

 Errors due to sampling may be high for small administrative areas.


 Sampling may not be feasible for problems that require very high accuracy.

Sampling is concerned with the selection of a subset of individuals from within a statistical
population to estimate characteristics of the whole population.
The stages of sampling include the following steps:
i. Defining the population of concern
ii. Specifying a sampling frame, a set of items or events possible to measure
iii. Specifying a sampling method for selecting items or events from the frame
iv. Determining the sample size
v. Implementing the sampling plan
vi. Sampling and data collecting
vii. Reviewing the sampling process

Sampling Frame
In the most straight forward case, such as the sentencing of a batch of material from production
(acceptance sampling by lots), it is possible to identify and measure every single item in the
population and to include any one of them in our sample. However, in the more general case this
is not possible. There is no way to identify all rats in the set of all rats. Where voting is not
compulsory, there is no way to identify which people will actually vote at a forthcoming election
(in advance of the election)
As a remedy, we seek a sampling frame which has the property that we can identify every single
element and include any in our sample and sampling frame must be representative of the
population
Types of Sampling
Two general approaches of sampling are used in social science research. With probability
sampling, all elements (e.g., persons, households) in the population have some opportunity of
being included in the sample, and the mathematical probability that any one of them will be
selected can be calculated. With non-probability sampling, in contrast, population elements are
selected on the basis of their availability (e.g., because they volunteered) or because of the
researcher's personal judgment that they are representative.
Probability Sampling includes: Simple Random sampling, Systematic sampling, Stratified
random sampling, Multistage sampling, Multiphase sampling and Cluster sampling
Non-Probability Sampling includes: Convenience sampling, Purposive sampling and Quota
sampling.

Simple Random Sampling


It’s applicable when population is small, homogeneous & readily available. All subsets of the
frame are given an equal probability and each element of the frame thus has an equal probability
of selection. It provides for greatest number of possible samples where it’s done by assigning a
number to each unit in the sampling frame. A table of random number or lottery system is used
to determine which units are to be selected.
Advantages
⚫ Estimates are easy to calculate.
⚫ Simple random sampling is always an EPS design, but not all EPS designs are simple
random sampling.

Disadvantages
⚫ If sampling frame is large, this method impracticable.

⚫ Minority subgroups of interest in population may not be present in sample in sufficient


numbers for study.

Systematic Sampling
It relies on arranging the target population according to some ordering scheme and then selecting
elements at regular intervals through that ordered list. The sampling technique involves a random
start and then proceeds with the selection of every kth element from then onwards. In this case, k
= (population size/sample size). It is important that the starting point is not automatically the first
in the list, but is instead randomly chosen from within the first to the k th element in the list and a
simple example would be to select every 10th name from the telephone directory (an 'every 10 th'
sample, also referred to as 'sampling with a skip of 10').
Advantages
 Sample easy to select
 Suitable sampling frame can be identified easily
 Sample evenly spread over entire reference population

Disadvantages
 Sample may be biased if hidden periodicity in population coincides with that of selection.
 Difficult to assess precision of estimate from one survey.

Stratified Sampling
It’s a technique where population embraces a number of distinct categories; the frame can be
organized into separate "strata." Each stratum is then sampled as an independent sub-population,
out of which individual elements can be randomly selected.
Advantages
 Every unit in a stratum has same chance of being selected.
 Using same sampling fraction for all strata ensures proportionate representation in the
sample.
 Adequate representation of minority subgroups of interest can be ensured by stratification
and varying sampling fraction between strata as required.
 Each stratum is treated as an independent population hence different sampling
approaches can be applied to different strata.

Disadvantages
 Sampling frame of entire population has to be prepared separately for each stratum
 When examining multiple criteria, stratifying variables may be related to some, but not to
others, further complicating the design, and potentially reducing the utility of the strata.
 In some cases (such as designs with a large number of strata, or those with a specified
minimum sample size per group), stratified sampling can potentially require a larger
sample than would other methods
Stratification is sometimes introduced after the sampling phase in a process called ‘post
stratification’. This approach is typically implemented due to a lack of prior knowledge of an
appropriate stratifying variable or when the researcher lacks the necessary information to create a
stratifying variable during the sampling phase. Although the method is susceptible to the pitfalls
of post hoc approaches, it can provide several benefits in the right situation. Implementation
usually follows a simple random sample. In addition to allowing for stratification on an ancillary
variable, post stratification can be used to implement weighting, which can improve the precision
of a sample's estimates.
Choice-based sampling is one of the stratified sampling strategies. In this, data are stratified on the
target and a sample is taken from each strata so that the rare target class will be more
represented in the sample. The model is then built on this biased sample. The effects of the input
variables on the target are often estimated with more precision with the choice-based sample
even when a smaller overall sample size is taken compared to a random sample. The results
usually must be adjusted to correct for the oversampling

Cluster Sampling
Cluster sampling is an example of 'two-stage sampling' where in the first stage a sample of areas are
chosen and in the second stage a sample of respondents within those areas is selected.
Population divided into clusters of homogeneous units, usually based on geographical
contiguity. Sampling units are groups rather than individuals and a sample of such clusters is
then selected where all units from the selected clusters are studied
Advantages
 It cuts down on the cost of preparing a sampling frame.
 This can reduce travel and other administrative costs.

Disadvantages
 Sampling error is higher for a simple random sample of same size.
 Often used to evaluate vaccination coverage in EPI

Difference between Strata and Clusters


Although strata and clusters are both non-overlapping subsets of the population, they differ in
several ways.
 All strata are represented in the sample; but only subsets of clusters are in the sample.
 With stratified sampling, the best survey results occur when elements within strata are
internally homogeneous. However, with cluster sampling, the best results occur when
elements within clusters are internally heterogeneous

Multistage Sampling
It involves a complex form of cluster sampling in which two or more levels of units are
embedded one in the other. This technique is essentially the process of taking random samples of
preceding random samples though not as effective as true random sampling, but probably solves
more of the problems inherent to random sampling.
It’s an effective strategy because it banks on multiple randomizations and its used frequently
when a complete list of all members of the population does not exists or it’s inappropriate.
Advantages
Survey by such procedure is less costly, less laborious & more purposeful

Quota Sampling
In this technique of sampling, the population is first segmented into mutually exclusive sub-
groups, just as in stratified sampling and then a judgment used to select subjects or units from
each segment based on a specified proportion.
For example, an interviewer may be told to sample 200 females and 300 males between the age
of 45 and 60. It is this second step which makes the technique one of non-probability sampling.
In quota sampling the selection of the sample is non-random. For example interviewers might be
tempted to interview those who look most helpful. The problem is that these samples may be
biased because not everyone gets a chance of selection. This random element is its greatest
weakness and quota versus probability has been a matter of controversy for many years.
Convenience Sampling
Also called grab or opportunity sampling or accidental or haphazard sampling.
It’s a type of non-probability sampling which involves the sample being drawn from that part of
the population which is close to hand i.e. readily available and convenient.
The researcher using such a sample cannot scientifically make generalizations about the total
population from this sample because it would not be representative enough and this type of
sampling is most useful for pilot testing.
ORGANISATION AND PRESENTATION OF DATA
At the end of this section you should be able to:
1. Organize data using a frequency distribution.
2. Represent data in frequency distribution graphically using histograms, frequency
Polygons, and ogives/cumulative frequency graphs.
3. Represent data using bar graphs or Charts, pie graphs, pictograms etc.

CONSTRUCTION OF A FREQUENCY DISTRIBUTION TABLE


When summarizing large amount of data, it’s helpful to arrange numbers in classes or categories
then determine the number of each class. This number is known as class frequency.
From this it’s possible to arrange data in classes together with corresponding class frequency.
Example
The following marks were obtained in a class as CAT marks:
35, 42, 49, 32, 46, 37, 37, 39, 29, 48, 38, 39, 31, 30, 48, 42, 44, 40, 38, 36, 35, 26, 28, 33, 47, 46,
41, 25, 31, 26, 49, 33, 32, 36, 37, 35, 34, 44, 37, 31, 38, 33, 39, 31, 35, 36, 33, 44, 35, 30, 37, 37,
44, 32, 40 and 44.
Use the data to construct a frequency distribution table:
General procedure for forming a frequency distribution table:

Step 1: Determine the classes:


 Find the highest and lowest values.
 Find the range by:
Range = highest value – lowest value
 Select the number of classes desired.
 Find the width by dividing the range by the number of classes and rounding up.
 Select a starting point (usually the lowest value or any convenient number less
than the lowest value); add the width to get the lower limits.
 Find the upper class limits.
 Find the boundaries.
Step 2: Tally the data.
Step 3: Find the numerical frequencies from the tallies, and find the cumulative
Frequencies.
Solution
Highest mark = 49
Smallest mark = 25
Range (49-25) = 24
No of classes = 6
𝑟𝑎𝑛𝑔𝑒 24

𝑛𝑜 𝑜𝑓 5
Class width = = = 4.8 = 5
𝑐𝑙𝑎𝑠𝑠𝑒𝑠

Therefore the frequency distribution table is of the form:

Class Frequency (f) Class mark (X)


25 – 29 5 27
30 – 34 12 32
35 – 39 25 37
40 – 44 11 42
45 -49 7 47

Class Interval and Limits


From class 35-39 in the above table, 35-39 is the class interval. The end values 35 and 39 are
called the class limits. 35 is the lower class limit and 39 is the upper class limit. This class
can accommodate the values between 34.5 and 39.5 and the values 34.5 and 39.5 are called the
class boundaries.

Class size or width is the difference between 39.5 and 34.5.


Midpoint (X), Class mark (X) is obtained by adding the lower and upper class limit and then
dividing by 2. Midpoint for class 35-39 is 37.i.e.
35+39
2
= 37 this information applies to ALL the other classes.

DATA PRESENTATION
Data can be presented using many methods some of which are outlined
below:
1. Frequency polygon
A frequency polygon is formed by joining the tips of the bars (the values of the frequencies) with
line segments.
2. Histogram
A Histogram is a graphical display of data using bars of different heights. It is similar to a Bar
Chart, but a histogram group’s numbers into ranges. Histograms are a great way to show results
of continuous data, such as: weight, height, how much time, etc.

3. Bar Graphs
A Bar Graph (also called Bar Chart) is a graphical display of data using bars of different heights
.

4. Pie Chart
It’s a special chart that uses "pie slices" to show relative sizes of data.

5. Pictographs
A Pictograph is a way of showing data using images. Each image stands for a certain number of
things.
6. Ogives
Cumulative histograms, also known as ogives, are graphs that can be used to determine how
many data values lie above or below a particular value in a data set. The cumulative frequency is
calculated from a frequency table, by adding each frequency to the total of the frequencies of all
data values before it in the data set. The last value for the cumulative frequency will always be
equal to the total number of data values, since all frequencies will already have been added to the
previous total.

Relative Measures
They include the following:
1. Ratio
Its one number expressed in relation to another by dividing the one number by the other.
Example
The sex ratio of Thika in 2004 was 343200 females to 322968 males. Calculate the ratio of
females to males and males to females

Ratio of females to males


343200
= 322968  1.06

322968
Ratio of males to females 343200  0.94
=
The interpretation of sex ratio is that for every male there are 1.06 females. Sometimes we
express this as the ratio per 100, 1000 or 100000 persons and we could comfortably say 106
females for every 100 males.
Other ratios commonly used are:
Population Density
Population density is a measurement of the number of people in an area. It is an average number.
Population density is calculated by dividing the number of people by area. Population density is
usually shown as the number of people per square kilometer.
Example
In 1967 the population density of Thika District was 666168 persons per 1955 Km 2. Determine
the population per square kilometer
If 1955 Km2 = 666168
1 Km2 =?
1 666168
1955  341 Persons per square kilometer
Exercise
In Turkana County the population density was 441946 persons per 426 Km 2. Calculate the
population density
Dependency Ratio
It’s a measure of the portion of a population which is composed of dependents (people who are
too young or too old to work). The dependency ratio is equal to the number of individuals aged
below 15 or above 64 divided by the number of individuals aged 15 to 64.
Example
In Thika District in the year 2010, the number of persons under the age of 15 was 152869 while
that of persons over 65 was 90329. If the persons aged between 15 and 64 were 462369,
calculate the dependency ratio for Thika District.
 90329
152869
462396  0.53

Proportion
It’s a special kind of ratios where the denominator is the total while the numerator is subject of
the
total.
Thus while ratio of females to males in Thika was 1.06, female represent 0.515 proportion of the
total. i.e.

343200
343200  322968  0.515
Percentage is a number or ratio as a fraction of 100. It is often denoted using the percent sign,
“%”,
or the abbreviation “pct.”
To calculate a percentage we simply multiply a proportion by 100. Females in the above
example are 51.5 % of the total population.

2. Rates
Are special forms of ratio which represent the probability of a certain event. Numerator is the
number of persons exposed to an event during a time period and the denominator is the number
of persons exposed to that event in the time period. To be a true rate, we must try to have only
risk at denominator and we generally call it crude rate.
Crude birth rate is given by the number of live birth per 1000 population in a given year.
Example
In 2011, the number of live births was 10390 in a population of 705594 persons. Determine the
crude birth rate for this population in the year 2011.
10390
1000  14.7
705594

GENERAL REVISION QUESTIONS

1. These data represent the record high temperatures in degrees Fahrenheit (O F) for
50 towns in Kenya. Construct a grouped frequency distribution for the data
using
7 classes.
112 100 127 120 134 118 105 110 109112
110 118 117 116 118 122 114 114 105109
107 112 114 115 118 117 118 122 106110
116 108 110 121 113 120 119 111 104111
120 113 120117 105 110 118 112 114 114

2. State and describe 6 types of charts used in statistics.

3. State meaning of the following terms as used in statistics.


i. Population
ii. Sample
iii. Data
iv. Variables
v. Datum
4. Distinguish between the following terms as used in statistics
i. Target population and accessible population
ii. Continuous variables and discrete variables
iii. Systematic errors and random errors
iv. Accuracy and precision
5. Describe the following types of sampling
i. Random sampling
ii. Cluster sampling
iii. Stratified sampling
iv. Systematic sampling
v. Multistage sampling
vi. Quota sampling
vii. Convenience sampling

You might also like