Stt102 PDF
Stt102 PDF
GUIDE
STT102
INTRODUCTORY STATISTICS
Lagos Office
14/16 Ahmadu Bello Way
Victoria Island, Lagos
e-mail: [email protected]
URL: www.nou.edu.ng
Printed
ISBN: 978-058-643-1
ii
STT102 COURSE GUIDE
CONTENTS PAGE
Introduction ........................................................................................... iv
Course Competencies .............................................................................. iv
Course Objectives .................................................................................... v
Working Through This Course .................................................................. v
Presentation Schedule ............................................................................. vi
Assessment ........................................................................................... vii
Assignments .......................................................................................... vii
Examination ..........................................................................................viii
How To Get The Most From The Course .................................................viii
Facilitation ............................................................................................viii
Learner Support ...................................................................................... ix
iii
STT102 COURSE GUIDE
INTRODUCTION
The course consists of twenty units divided into four modules of content
of five units each. It involves basic principles in collection, compilation,
analysis, presentation of data and the drawing of conclusions from
statistical analysis. The material has drawn on several practical
examples from the local environment so that the relevance of theory to
the practical situation may be appreciated. Extensive use of the media
and several case studies in the social and health sectors from information
collection agencies and government data support the student’s study.
Since computers are being used to an increasing extent in the hospital
services, and in view of the fact that they, are ideal for performing
statistical calculations and analysis, the student is advised to be
computer literate. However, the need to write one’s computer programs
does not arise since they already exist for almost all aspects of statistics.
The student requires no pre-requisites for the course.
COURSE COMPETENCIES
The aim of the course is to introduce the student to the statistical process
and methods in common use.
It will be achieved by:
iv
STT102 COURSE GUIDE
COURSE OBJECTIVES
At the end of this course, you should be able to attend the following:
You are required to read the study units, set books and other materials
provided by the National Open University to complete the course. You
will also need to work through practical and self- assessed exercises and
submit assignments for assessment purposes. The course will take you
about hours to complete at the end of which you will write a final
examination.
Module 1
Module 2
v
STT102 COURSE GUIDE
Module 3
Unit 1 Regression
Unit 2 Simple Concepts of Probability
Unit 3 Relationship between Population and Sample
Unit 4 Normal Distribution
Unit 5 Sampling Distribution of the Mean and the Central Limit
Theorem
Module 4
PRESENTATION SCHEDULE
The weekly activities are presented in Table 1 while the required hours
of study and the activities are presented in Table 2. This will guide your
study time. You may spend more time in completing each module or
unit.
ASSESSMENT
Table 3: Assessment
S/N Method of Assessment Score (%)
3 Tutor Mark Assignments 30
4 Final Examination 100
Total 100
ASSIGNMENTS
Take the assignment and click on the submission button to submit. The
assignment will be scored, and you will receive feedback.
vii
STT102 COURSE GUIDE
EXAMINATION
Finally, the examination will help to test the cognitive domain. The test
items will be mostly application, and evaluation test items that will lead
to creation of new knowledge/idea.
FACILITATION
There will be two hours of online real time contact per week
making a total of 26 hours for thirteen weeks of study time.
At the end of each video conferencing, the video will be uploaded
for view at your pace.
You are to read the course material and do other assignments as
may be given before video conferencing time.
The facilitator will concentrate on main themes.
The facilitator will take you through the course guide in the first
lecture at the start date of facilitation.
viii
STT102 COURSE GUIDE
Send you videos and audio lectures, and podcasts if need be.
Read all the comments and notes of your facilitator especially on your
assignments, participate in forum discussions. This will give you
opportunity to socialise with others in the course and build your skill for
teamwork. You can raise any challenge encountered during your study.
To gain the maximum benefit from course facilitation, prepare a list of
questions before the synchronous session. You will learn a lot from
participating actively in the discussions.
LEARNER SUPPORT
COURSE INFORMATION
ix
MAIN
COURSE
CONTENTS PAGE
Module 1………………………………………………… 1
Module 2………………………………………………… 28
Module 3………………………………………………… 71
Unit 1 Regression………………………………… 71
Unit 2 Simple Concepts of Probability………….. 80
Unit 3 Relationship between ……………………… 91
Population and Sample……………………. 97
Unit 4 Normal Distribution
Unit 5 Sampling Distribution of the Mean and the
Central Limit Theorem………………………. 107
Module 4……………………………………………………. 112
Module Introduction
Unit Structure
1.1 Introduction
1.2.1 Intended Learning Outcomes (ILOs)
1.3 Main Content
1.3.1 Definition of Statistics
1.3.2 Decision Making
1.3.3 Population and Sample
1.3.4 Variable and Observation
1.4 Self-Assessment Exercise(s)
1.5 Conclusion
1.6 Summary
1.7 References/Further Readings
1.1 Introduction
This unit focuses mainly on the aims and application of the statistical
method in nursing education and practice. It gives definitions of basic
statistical concepts and states salient reasons why information or data
relating to health issues require statistical treatment. We thus have to
look at what you have to learn in this unit in the objectives stated
hereunder.
1
STT102 INTRODUCTORY STATISTICS
Statistics is the science that deals with collecting and summarizing facts
which are expressible in numerical form. It also involves the
measurement and comparison of facts ultimately leading to the
discovery of the existence of significant relationships between them.
This is helpful in revealing trends so that important estimates or
forecasts may be carried out.
You will therefore observe from the above definition of Statistics that
we often ask so many questions that tend to have Statistical
implications. For example: What is the average height of ten-year-old
girls in Nigeria? How many Nigerian mothers wean their babies at nine
months old? What is the life expectancy of a Nigerian woman? You
often engage in such discussions locally or at your work-place. There is
a variety of opinions on such subject yet no tangible information results
except reliable data are available.
However, these questions are not only of grave consequence to the well-
being of individuals and groups of people but have economics, political
and health implications for the country. You will therefore notice its
relevance and importance in the next section and the course in general.
1.3.2 Decision-Making
Let us reflect over the questions raised in the previous section of the unit
and several everyday actions we engage in. you will notice that we tend
to make several decisions in our daily life. Some of these decisions may
be simple while others are consequential and involve some degree of
uncertainty. You will further notice that decisions are usually made with
regard to available information given or assumed.
Quite reasonably then, numerical information is preferable since it
presupposes an assessment of consequences. Statistics may therefore be
2
STT102 MODULE 1
3
STT102 INTRODUCTORY STATISTICS
1.5 Conclusion
In this unit, you have been able to identify the aims of the statistical
method. You should also be able to define, recognize some basic
statistical terms, and be able to relate these concepts to some case
studies occurring in nursing practice and other cognate areas.
1.6 Summary
What you have been able to learn in this unit deals with the extent
statistical method is required in the analysis and interpretation of figures
relating to health issues. You also learned that the analysis and
interpretation of figures are influenced by factors which include for
instance variability of human beings in their illnesses, their reactions to
them and ultimately in the treatment of these illnesses.
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
4
STT102 MODULE 1
Unit Structure
2.1.1 Introduction
2.2 Intended Learning Outcomes (ILOs)
2.3 Main Content
2.3.1 Is a Statistical Study Necessary?
2.3.2 Simple Random Sampling
2.3.3 Systematic Random Sampling
2.3.4 Cluster Sampling
2.3.5 Stratified Sampling
2.3.6 Multistage Sampling and Observation
2.4 Self-Assessment Exercise(s)
2.5 Conclusion
2.6 Summary
2.7 References/Further Readings
2.1 Introduction
This unit is concerned not only with how to plan and conduct a
statistical study but with the methods of gathering adequate and reliable
amount of information which needs to be treated scientifically.
5
STT102 INTRODUCTORY STATISTICS
6
STT102 MODULE 1
the population is selected more than once. The other is simple random
sampling without replacement, where a member of the population is
selected at most once.
You should also note that obtaining a simple random sample by picking
slips of paper out of a container is not practical, especially when the
population being sampled is large. Several practical processes for
getting simple random samples exist. A table of random numbers is one
common method. You should also note that computers can be employed
to obtain a simple random sample from a population.
This method is useful in the case when members of the population under
consideration are widely scattered geographically.
Cluster sampling can be executed in the following three steps:
7
STT102 INTRODUCTORY STATISTICS
Example 2.2.1
The annual salaries for five Government Officials are as shown in Table
2.3.1 below. These are in millions of Naira, rounded up to the nearest
million.
Official Salary
Governor (G) 5
Deputy Governor 4
Secretary to the State Govt. (SSG) 3
Head of Service (H) 2
Permanent Secretary (P) 1
8
STT102 MODULE 1
(i) You will observe that there are ten possible samples of two
salaries from the population of five salaries. The listing in the
table below is done using the letters in parenthesis to represent
the officials.
Table 2.2.2 possible samples of two salaries from the population of five
samples
(ii) You will notice that the procedure used in (ii) is a simple random
sampling. You will also notice that each of the possible samples
of two salaries is equally likely to be selected. These are ten
possible samples and the chance of selecting any particular
sample are 1/10 (1 in 10).
Example 2.2.2
9
STT102 INTRODUCTORY STATISTICS
Answer
You will notice that a simple random sampling process was used. You
will also observe that the use of this process was possible because a
listing of all pupils in the city was possible. Investigator-induced bias
was impossible since the names of the pupils in the study were chosen
through the use if a random number table.
Example 2.2.3
Answer
Example 2.2.4
A Nigerian town has 25 major homesteads with each divided into 150
houses. A health inspector wishes to carry out a survey on the attitudes
of the two dwellers toward HIV. Discuss possible sampling schemes for
carrying out the survey.
Answer
(i) A simple random sample of names drawn from a list of all town
dwellers is not practicable list such list does not exist or is not up-
to-date.
(ii) A simple random sample of homesteads from among the 25
homesteads is taken.
10
STT102 MODULE 1
2.5 Conclusion
In this unit, you have learned that nursing practice and research involves
the conduct of statistical survey dealing with observations on humans.
This type of research or practice is non-experimental in nature because it
is not always possible to manipulate the study environment.
You also learned that statistical surveys in nursing practice and research
fall under three categories, namely, the cross-sectional study,
retrospective study and the prospective. You learned that in general,
these studies are often conducted for the purpose of establishing
associations between two disease states and presence of a risk factor.
You also learned that the nature of these surveys determines the
procedure of collection of data or information required for them.
Furthermore, you learned the various procedures for the collection of
data and what particular sampling process is suitable for conducting any
of the surveys.
2.6 Summary
11
STT102 INTRODUCTORY STATISTICS
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
12
STT102 MODULE 1
Unit Structure
3.0 Introduction
3.1 Intended Learning Outcomes (ILOs)
3.2 Main Content
3.2.1 Examples of Bias
3.2.2 Sex Ratio at Birth
3.2.3 Hospital Statistics
3.2.4 Treatment Day
3.2.5 Statistics Relating to Post-Mortem
3.2.6 Patients Follow-up Studies
3.2.7 Infant Feeding
3.2.8 Uses of Questionnaires Sampling and Observation
3.3 Self-Assessment Exercise(s)
3.4 Conclusion
3.5 Summary
3.6 References/Further Readings
3.0 Introduction
You will learn in this unit how bias may be introduced into any
statistical survey conducted in the health sector.
13
STT102 INTRODUCTORY STATISTICS
14
STT102 MODULE 1
You will find out that the frequency with which male and female births
are recorded in the hospitals and local government councils is unlikely
to be representative of the births in the country as a whole. This
important information is distorted by scanty records in the rural areas or
by cultural preferences like proud parents who are likely to record only
the births of their sons.
Whatever the reasons, sex ratio at birth may not be easily available and
as such one cannot generalize from such a sample of births about the
population of the whole country without entertaining some degree of
bias.
You will then observe from the above points that it is obvious that it is
not possible to generalize the fatality rate or success of treatment of
some diseases with any approach to accuracy of samples of the
population of hospitals statistics without incorporating a measure of
bias.
One other source of statistical difficulty with the value of some form of
treatment of diseases is the treatment day. For instance, the level of
fatality rate or success of treatment of a disease at different stages is
likely to be seriously biased.
15
STT102 INTRODUCTORY STATISTICS
You should notice from the above points that any feature observed at
death is most likely to be representative of the living population.
There can never be the slightest guarantee that the individuals who
decide to reply are representative of the sample of all the individuals
approached. You may correct the situation by stating the number of
16
STT102 MODULE 1
3.4 Conclusion
In this unit, you have learned that a statistical study may be influenced
by some degree of bias. You have also learned some sources of this bias
in medical and nursing practice. In addition, you also learned the various
ways in which these sources of bias could be avoided or corrected.
3.5 Summary
In this unit, you learned that if you wish to generalize from some sample
group of observations, you must determine a sample which is
representative of the population to which it belongs.
If you select or accept samples deliberately, you should realize that bias
may occur through the operation of several factors leading to a sample
which is not representative of the total population. Self-selection of
members of a group, absence of some of the required records e.g., by
individuals who do not reply to a questionnaire are common forms of
bias.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Jefferies, P.M.(1995) Mathematics in Nursing, 4th edition, Bailliere
Tindall, London: Cassell and Collier, Macmillan Publishers
Ltd.
17
STT102 INTRODUCTORY STATISTICS
Unit Structure
4.0 Introduction
4.1 Intended Learning Outcomes (ILOs)
4.2 Main Content
4.2.1 Questions and Answers
4.2.2 Design of form of Computer Use
4.3 Self-Assessment Exercise(s)
4.4 Conclusion
4.5 Summary
4.6 References/Further Readings
4.0 Introduction
In this unit, we will discuss the last factor that influences the collection
of data when conducting a statistical survey. This deals with forms of
record.
18
STT102 MODULE 1
For example, you need to construct the form in such a way that age last
birthday is required. You will be aware that date of birth may even be
preferable to the age last birth since date of birth is constant while age
can be calculated from it from time to time.
(ii) You need to ensure that every question in the form is self-
explanatory. Respondents need not consult a separate sheet of
instructions to answer question on the form of record.
(iii) You should ensure that every question requires some answers.
This guarantees that the respondents offer useful information or
possesses a characteristic sought for.
(iv) You should specify the degree of accuracy required in answering
every question. For example, if body temperatures are to be taken
orally or rectum.
(v) You need to ensure that any form of record which must be
completed by many people should be worded simply and
logically.
You should ask questions that vary widely on circumstances.
You should distribute a large number of questions over different
samples.
You should be aware that a shorter form with questions may
promote greater accuracy of reply and also reduce the amount of
non-response.
Nowadays, almost all that are transferred to computer files are usually
by typing the information in, by hand from the completed
questionnaires. You should ensure that the form be designed so as to
facilitate this transfer.
19
STT102 INTRODUCTORY STATISTICS
4.4 Conclusion
4.5 Summary
One of the most decisive and difficult tasks in any you learned that
questions posed should be clear answer and entail a standard of
accuracy.
You also learned that the forms need be designed so inquiry is the
construction of a suitable form of record and unambiguous. Each
question should require some as to facilitate transfer of information to a
computer file.
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
20
STT102 MODULE 1
Unit Structure
5.0 Introduction
5.1 Intended Learning Outcomes (ILOs)
5.2 Main Content
5.2.1 Tabulation
5.2.2 Diagrams, Charts and Graphs
5.2.3 Pictograms
5.2.4 Block and Bar Diagrams
5.2.5 Pie Charts
5.2.6 Graphs
5.3 Self-Assessment Exercise(s)
5.4 Conclusion
5.5 Summary
5.6 References/Further Readings
5.0 Introduction
This unit is concerned with the meaningful manner in which raw data
(which are usually in the form of large sets of unorganized numerical
values) are summarized and interpreted so that important features and
trends may be identified.
A set of data may be presented in tables or described by means of
diagrams, charts and graphs. Before discussion these terms, let us look
at what you should learn in this unit as stated in the objectives below:
21
STT102 INTRODUCTORY STATISTICS
5.2.1 Tabulation
You need to be aware that a statistical table should have the following
characteristics:
5.2.3 Pictograms.
Example 5.1
22
STT102 MODULE 1
Answer
1990/1991
2000/2001
Example 5.2
23
STT102 INTRODUCTORY STATISTICS
14
12
10
0
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
Example 5.3
Attitude scores of five newly admitted nursing students towards
alcoholic patients are given as follows;
The scores are determined by
(1= very negative, 2= slightly negative, 3= slightly positive, 4= positive,
5= very positive)
5,2,3,1,4
Use a pie chart to convey the information
24
STT102 MODULE 1
Step 1: You will calculate the percentage that each attitude score
is of the total. These are shown in the table above
Step 2: you will find the size in degrees of each proportion in step
1. The total is 360 which is the angle in the circle.
Step 3: You will draw a circle of reasonable size and mark the
angles obtained in step 2.
Graphs
A very good graph should possess a clear layout and indicate the
following:
(i) Title
(ii) Unit of measure
(iii) Scale
(iv) Source of data
The following example will teach you how to construct good graph.
Example 5.4
25
STT102 INTRODUCTORY STATISTICS
The scale should usually start from zero. You need to note that the
position on the graph is located by the coordinates of the point, for
example, the position of the number of students admitted in 1999 given
as 1400 is located by moving the x-axis until 1999 and at this point, you
will move upward on the scale until you get to the value as shown in
figure 5.2 when you have located all the points in this way, then join
them in the order presented in the table by means of lines.
16
14
12
10
0
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999
Of 100 patients in an orthopedic hospital who were asked for their room
preferences, 50 wanted private rooms, 40 wanted semi-private and 10
would make do with any room. Present this data by means of a bar chart.
5.4 Conclusion
In this unit, you learned how raw data are summarized and interpreted.
In this wise, you saw that for a comprehension of a series of figures
tabulation is essential.
26
STT102 MODULE 1
5.5 Summary
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
27
STT102 INTRODUCTORY STATISTICS
Module Introduction
Unit Structure
1.0 Introduction
1.1 Intended Learning Outcomes (ILOs)
1.2 Main Content
1.2.1 Frequency Distribution and Frequency Tables Using
Discrete Data
1.2.2 Frequency Tables Using Class Intervals
1.2.3 Relative Frequency
1.2.4 Histograms
1.2.5 Frequency Polygon
1.3 Self-Assessment Exercise(s)
1.4 Conclusion
1.5 Summary
1.6 References/Further Readings
1.0 Introduction
In this unit, you will learn one of the most significant and fundamental
process in which raw and un-organized data are displayed. This involves
28
STT102 INTRODUCTORY STATISTICS
You will learn how this table can be constructed for discrete and
continuous data. You will also learn how to compare the frequency
distributions of two different sets data in this study unit. This
comparison for ease of interpretation is usually done through the
proportion or percentage of the total number of observations falling into
each interval and is known as the relative frequency i.e.
Absolute frequency relative frequency = 100%
8 8 6 5 2 4 6 4 6 6
5 6 6 2 8 7 6 3 2 6
29
STT102 INTRODUCTORY STATISTICS
You will tabulate the number of times each score appears in the simple
and display the result as given below:
The table you have obtained is called a frequency table and it displays
the frequency distribution of the 20 attitude scores.
40 25 30 30 40 40 41 40 42 40
40 40 42 42 50 52 55 60 60 65
40 61 62 60 72 60 40 42 48 42
70 67 65 40 42 42 40 40 40 40
30
STT102 INTRODUCTORY STATISTICS
Score Frequency
25 1
30 2
40 14
41 1
42 7
48 1
50 1
52 1
55 1
60 4
61 1
62 1
65 2
67 1
70 1
72 1
Total 40
It is also possible to represent the table data in Table 6.1 and 6.2 by
means of frequency diagram as shown below, you will observe that in
this diagram frequency is given on the vertical axis while the collection
of data is given on the horizontal axis. The height of each bar stands for
the frequency of occurrence of each score.
31
STT102 INTRODUCTORY STATISTICS
One other important feature you will study is the frequency diagram or
the shape of distribution of a collection of data. Let us consider the
examples shown in figure 6.3 below.
32
STT102 INTRODUCTORY STATISTICS
You will observe that in (i) and (ii) in figure 6.3; the frequencies are
almost equal on either side of a center point. We refer to a distribution of
data values with this shape as a symmetric distribution. On the other
hand, in (iii) and (iv) in the figure, the frequencies are not equal on
either side of the center point. Such a distribution of data values with
this shape is said to be non-symmetric.
In this case, you will divide the data values into groups (classes or
interval) and then you will record the number of data values which fall
into this interval. To illustrate this technique, let us consider the data
representing the weights in pounds of 30 cancer patients given below:
You will notice that many of the data values occur once or twice in the
sample. Hence, tabulating the frequency of occurrence of each possible
observation provides minimal information about the raw data alone.
Let us see how to construct the frequency distribution in this case. First,
you will arrange the weights in ascending order of magnitude.
Table 1.3; Weights (in pounds) Arranged in Ascending Order of
Magnitude
62.5 63.4 67.5 70.7 77.6
62.6 64.1 68.7 70.8 78.2
62.8 64.5 70.3 70.9 79.6
62.8 64.7 70.3 71.5 80.2
62.9 66.4 70.4 72.9 80.4
63.4 66.9 70.5 77.3 80.5
33
STT102 INTRODUCTORY STATISTICS
You should note that too few intervals result in loss of valuable
information about shape and distribution of data while too many
intervals do not convey any meaningful information. In general, you will
arrive at the right choice of number of suitable intervals by trial and
error.
34
STT102 INTRODUCTORY STATISTICS
frequency table of ages of 1000 normal girls selected at random from the
population of normal girls.
Suppose you need to compare 5-girls suffering from VVF and of ages
10-12 years with 100 normal girls of the same age group. Then, the
proportion of 50 girls with VVF in the age range 10-12 years is much
greater than that for the normal.
1.2.4 Histogram
35
STT102 INTRODUCTORY STATISTICS
Let us see how to construct a histogram using the data in Table 6.6
below:
0
60.95 62.95 64.95 66.95 68.95 70.95 72.95 74.95 76.95 78.95 80.95
36
STT102 INTRODUCTORY STATISTICS
You next join the dots by straight line. You should be aware that
frequency polygons or relative frequency polygons are useful when
comparing the frequency distributions of two or more sets of data.
0
60.95 62.95 64.95 66.95 68.95 70.95 72.95 74.95 76.95 78.95 80.95
1.4 Conclusion
37
STT102 INTRODUCTORY STATISTICS
1.5 Summary
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
38
STT102 INTRODUCTORY STATISTICS
Unit Structure
2.0 Introduction
2.1 Intended Learning Outcomes (ILOs)
2.2 Main Content
2.2.1 Class Boundaries
2.2.2 Cumulative Frequency
2.2.3 Cumulative Relative Frequency
2.2.4 Ogive
2.3 Self-Assessment Exercise(s)
2.4 Conclusion
2.5 Summary
2.6 References/Further Readings
2.0 Introduction
You will learn how this concept is displayed in a table and as a graph.
You will also learn how an associated concept (relative cumulative
frequency) is obtained as well as the interpretation of both of them.
39
STT102 INTRODUCTORY STATISTICS
You will observe that this concept was discussed and used in the
previous unit. It is a method of showing the classes on the horizontal
axis of a histogram. Recall that the lower-class boundary of a class is the
number halfway between the lower-class limit of the class and the
upper-class limit of the next higher class.
For instance, consider the classes 40-49, 50-59 and 59-60 of the ages of
40 cancer patients given below (see Table 2.1). We have that
You will also notice that the number of cancer patients whose ages are
less than 80 is given by the cumulative frequency
. In the next part of the unit, we will discuss a related concept.
40
STT102 INTRODUCTORY STATISTICS
2.2.4 Ogive
41
STT102 INTRODUCTORY STATISTICS
1.2
0.8
0.6
0.4
0.2
0
10 20 30 40 50 60 70 80 90
2.4 Conclusion
In this unit, you have learned the procedure for displaying cumulative
frequency or cumulative relative frequency distribution as a graph. You
also learned how to interpret a collection of data through this concept.
2.5 Summary
You have learned the procedures for presenting and summarizing raw
and unorganized set of data. You also learned how to interpret these data
meaningful in terms of graphs.
42
STT102 INTRODUCTORY STATISTICS
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
43
STT102 INTRODUCTORY STATISTICS
Unit Structure
3.0 Introduction
3.1 Intended Learning Outcomes (ILOs)
3.2 Main Content
3.2.1 Measures of Location or Central Tendency
3.2.2 Using the Summation Operator
3.2.3 Mean
3.2.4 Median
3.2.5 Mode
3.2.6 Choice of a Measure of Central Tendency
3.3 Self-Assessment Exercise(s)
3.4 Conclusion
3.5 Summary
3.6 References/Further Readings
3.0 Introduction
In two of the preceding units, you have learned that a collection of data
can be represented and summarized by graphs, diagrams, frequency and
cumulative distributions. You also learned that these techniques are
useful in showing some important features of the data.
You will now see in this unit that it is more desirable to describe a
collection of data in terms of some other numerical summaries which
play crucial roles in the inferential estimation of a population.
44
STT102 INTRODUCTORY STATISTICS
We shall discuss their advantages and situations when each one can be
used. You will also learn how to define and compute each of the terms.
Let us now look at a brief description of one of the most commonly used
arithmetic operations in statistics.
∑ i
The symbol above means that you have taken the sum of the x values
starting from i=1 to i=n with n being the number of observations.
If you apply this arithmetic manipulation to the weight losses, you have
that the sum of the weight losses is given by
∑
We will now look at the following examples which illustrate the
common uses of the summation operation:
Example 3.1
45
STT102 INTRODUCTORY STATISTICS
The ratings of the quality of nursing care in the hospital unit (scale of
rating is from 1to 10) by six patients are 3, 2, 4, 1, 6, 7. Using these data,
compute the following using ∑ .
(i) ∑
(ii) ∑
∑
(iii) ( )
(iv) ∑( )
(i) ∑
(ii) ∑
∑
(iii) ( ) ( )
( )
(iv) ∑( ) ( ) ( ) ( ) ( )
( ) ( )
(v)
3.2.3 Mean
The mean of a set of data is the most commonly used measure of central
tendency. It is the arithmetic average of the collection of data i.e. the
sum of the collection of data divided by the amount of data. It is
represented by
∑
̅
Where ̅ read bar’’ denote the mean, ∑ is the sum of all the
values and is the number of data in the collection. An illustration of
how to find the mean is given below:
Example 3.2
Answer
∑
̅
̅
46
STT102 INTRODUCTORY STATISTICS
Example 3.3
Answer
∑( ̅) ( ) ( ) ( ) ( )
( )
You will notice that if the mean is subtracted from all the sample value,
the sum of these differences is zero. The difference between a sample
value and the mean is sold to be a deviation. We shall discuss the
importance of this property of the mean in the next study unit.
Example 3.4
Answer
∑
̅
̅
You will observe that because of one large data value, the mean has
considerably increased. This illustrates one disadvantage of the mean in
that it is affected by extreme values, particularly for small number of
observations in a sample.
3.2.4 Median
Example 3.5
In a research unit of a hospital, the hospital stay (in days) for seven
patients are; 12, 13, 15, 16, 17, 18, 20. Determine the median hospital
stay for these patients.
47
STT102 INTRODUCTORY STATISTICS
Answer
You will first arrange the data in order of increasing magnitude i.e.
The median hospital stay for these patients is 16 days since there are
seven observations.
Example 3.6
Answer
You will first arrange the data in order of increasing magnitude i.e.
Here the median is the average of the 4th and the 5th value
3.2.5 Mode
Example 3.7
15,15,15,12,13,14,16,12,13
Answer
The mode is 15 since this is the most frequently occurring score or value
in the collection of observations.
48
STT102 INTRODUCTORY STATISTICS
You should be aware that the mode is scarcely used as a single measure
of the central tendency of a collection of data. It is of benefit to use two
or three of the measures of central tendency to describe a sample under
consideration.
3.4 Conclusion
In this unit, you have learned how to define and compute measures of
central tendency for a set of data. You have also been able to summarize
and describe data qualitatively using these measures. Furthermore, you
have been able to subject data under consideration to statistical analysis.
3.5 Summary
You have been able to learn in this unit that the general position of a
frequency distribution on some scale is measured by an average. You
also learned that there are three averages or numerical summaries in
common use. These are
(i) The arithmetic mean;
(ii) The median; and
(iii) The mode
49
STT102 INTRODUCTORY STATISTICS
Furthermore, you learned that the mean has the disadvantage that it is
affected by extreme value and that the median is more preferable as a
measure of central tendency in such a case.
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
50
STT102 INTRODUCTORY STATISTICS
Unit Structure
4.0 Introduction
4.1 Intended Learning Outcomes (ILOs)
4.2 Main Content
4.2.1 Measures of Location or Central Tendency
4.2.2 Measures of Dispersion
4.2.3 Range
4.2.4 Variance, standard Deviation
4.2.5 Percentiles
4.3 Self-Assessment Exercise(s)
4.4 Conclusion
4.5 Summary
4.6 References/Further Readings
4.0 Introduction
Recall that in the study if unit on presentation of data, you learned that
pictorial representations and frequency distribution can describe a
collection of unorganized data. In this unit, you will learn that apart
from providing a mental image of the frequency distribution of a set of
data, there is a means of calculating a measure that reflect the degree of
spread of the observed values above the central point.
The most widely used measures of spread or variability are the range,
the variance and standard deviation. Other measures are percentage,
percentiles, rates and ratios. Before discussing these concepts, you need
to examine the following objectives of the unit.
51
STT102 INTRODUCTORY STATISTICS
However, it is possible for two data sets to have the same mean, the
same median or the same mode and yet be quite different in other
respects. For example, let is consider two sets of measurements in (a)
and (b) below that are centered around the same mean value but do not
have the same frequency distribution. You will observe that the
diagrams reflect that a greater number of data values are well-scattered
about the mean in (a) than in (b).
Figure 4.1: Two histograms with equal mean value but different spread
values about the mean.
4.2.2 Range
For a set of data, the range is the difference between the largest and the
smallest scores in the sample. Let us consider the following collection of
data: 30 40 50 60 70 80
52
STT102 INTRODUCTORY STATISTICS
From the diagrams, you will notice that the range for both (a) and (b) is
, but there is a remarkable difference between the spread
of the points about the centre of the two frequency distributions. The
majority of scores clusters about and in (a) while the scores in(b)
spread between and more evenly.
You can therefore observe that though the ranges of (a) and (b) are the
same yet the scores in (b) have greater spread or variability than in (a).
Another disadvantage of the range as you can observe is that it is
affected by extreme scores in collection of data. This means that the
presence of one or more extreme scores results in a very large value for
the range and this gives a misleading impression of the true spread of the
data. Let us now consider the following examples:
Example 4.1
The largest and the smallest scores are and respectively. Hence the
range is
53
STT102 INTRODUCTORY STATISTICS
Example 4.2
Answer
The largest and the smallest scores are and respectively. Hence
the range is .
You will notice from Examples (4.1) and (4.2) that the range provides a
misleading idea of the true spread or variability of a collection of data.
This is because only two numbers are use in calculating the range. You
need to be aware that by making use of all the measurements in a set as
well as their deviations from the centre point of a distribution, a more
valid measure of variability of the measurements is obtained.
We define a deviation as the distance between a measurement in the set
and the mean value for the set. Let us look at the illustration of this term
using the data in Example 9.2.1.2. Deviations from the mean of the data
set are given in the third column of Table 9.1 below.
54
STT102 INTRODUCTORY STATISTICS
From Table 4.1, you will observe that the deviations of the individual
measurements from the mean give an indication of how spread out the
measurements are. You should be aware that the larger the deviations,
the more dispersed the measurements are from the Centre of the
distribution (the mean)
( ̅)
You should note that the widely used measure of variability is the
standard deviation (s) while the variance is also a very significant
indicator of measure of spread of data values. We now illustrate these
concepts with the example of the data in table 9.1
Here the square of the deviations of the (variance (s)) ages of recipients
of nursing scholarship from the mean is 215.60.
( ̅)
The standard deviation, ̈ ̅̅̅ √
√ ̈ ̅̅̅̅̅̅̅
You have now given a suitable description of the location of the center
(the mean) of a collection of data as well as how the measurements are
spread about the center (the standard deviation).
55
STT102 INTRODUCTORY STATISTICS
Example 4.3
Drug 1 Drug 2
Mean ( ̅ ) 100 180
Standard Deviation ( ) 50 70
Which of the drug would you say is the most efficacious based on the
above information?
Answer
Drug 2 has the lower mean of number patients and it has a larger spread
of values above the mean. This implies that for this collection of data
they are some very short healing times as well as very long ones.
On the other hand, drug 1 has most of its values clustered more closely
about the mean 100, showing that the healing times do not change
appreciably in either direction from the mean (100). It therefore follows
that base on the absence of very long healing times, Drug 1 is adjudged
the most efficacious.
Example 4.4
Survival ( ) Deviation ( ̅)
( ̅)
3
8
12
18
19
20
24
56
STT102 INTRODUCTORY STATISTICS
24
25
27
∑
∑ ̅) ̅)
̅ ∑( ∑(
The mean ( ̅ )
The standard deviation
√ √
Example 4.5
You will construct the following table where the desired concepts are
easily observed:
57
STT102 INTRODUCTORY STATISTICS
∑
̅ ∑(
̅)
In the previous part of the study unit, you have seen how data may be
summarized by tables, graphs and by means of numerical summaries
like the mean and the standard deviation. You will now see how data
may be summarized by another measure by the percentiles. This
measure involves the representation of data in relative form. This
measure involves the representation called the percentile facilitates the
comparison between two sets of data.
You then divide this number by the total number of scores in the
collection and multiply by 100 to convert to percent i.e.
Percentile
If 80 nurses took the qualifying examination and 16 nurses scored lower
than nurse X, percentile score of is given by
58
STT102 INTRODUCTORY STATISTICS
4.4 Conclusion
In this unit, you saw that certain numerical values are necessary as
descriptions of the frequency distribution of a collection of data.
You learned that the most important of these are usually the mean and
the standard deviation. In addition, you learned that the mean alone is
rarely used. Rather, apart from taking into consideration the center of a
frequency distribution, the measure of spread (or variability) it displays
about the center is significant. In essence, you should be aware of not
only the average of a collection of data but the spread of data around it.
4.5 Summary
59
STT102 INTRODUCTORY STATISTICS
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
60
STT102 INTRODUCTORY STATISTICS
UNIT 5 CORRELATION
Unit structure
5.0 Introduction
5.1 Intended Learning Outcomes (ILOs)
5.2 Main Content
5.2.1 Correlation: Meaning and Interpretation
5.2.2 Data Arrangement
5.2.3 The Scatter Diagram
5.2.4 Numerical Representation of Relationships between
Variables
5.2.5 Pearson’s Moment of Correlation Coefficient (r)
5.2.6 Interpretation of Correlation Coefficient
5.2.7 Precautions in use in the Interpretation of Correlation
5.2.8 Correlation Ratio
5.2.9 Spearman’s Rank Order Correlation Coefficient
Conclusion
5.3 Self-Assessment Exercise(s)
5.4 Conclusion
5.5 Summary
5.6 References/Further Readings
5.0 Introduction
Some widely used methods for examining the relationship between two
or more variables and for making predictions are correlation analysis
and regression analysis.
61
STT102 INTRODUCTORY STATISTICS
You may have noticed from the introduction that there are several
correlation procedures available. They provide the same type of
information on the direction and the magnitude of the relationship
between variables.
In a correlation study, data are usually arranged in pairs (Xi, Yi). For
example, let us see how to represent the following information as data
layout for correlation study.
62
STT102 INTRODUCTORY STATISTICS
You plot each of the n pairs of points (X, Y) on the graph with the X’s
and Y’s being plotted on the horizontal and vertical axes respectively.
You should try to obtain the scatter diagram showing the above point
measurements in Table 10.1 as given in Figure 10.1 below
Values of
63
STT102 INTRODUCTORY STATISTICS
Apart from the study of relationships through the scatter diagrams, you
need to be aware that a numerical representation of such relationship
exists. This is called correlation coefficient which is the magnitude or
strength of the relationship between variables.
64
STT102 INTRODUCTORY STATISTICS
√
Where,
(∑ )
∑
(∑ )
∑
(∑ )(∑ )
∑
Let us illustrate this computation using the following example;
Example 5.1
65
STT102 INTRODUCTORY STATISTICS
7 2.0 0.5
∑ ∑
∑
∑
∑
Answer
(∑ )
∑
(∑ )(∑ )
∑
√ √
Let us consider the following examples where the spearman’s rank order
correlation coefficient is used:
67
STT102 INTRODUCTORY STATISTICS
Nurse No Rank
Observer 1 Observer 2
1 2 5
2 3 1
3 4 6
4 5 2
5 1 3
6 6 4
You should note that the Spearman’s Rank Order Correlation coefficient
is used to measure the degree of relationship between the scoring of the
two observers.
∑
( )
With d being the difference between ranks for each individual and n is
the number of pairs of ranks. To compute this coefficient for the
previous example we have that
∑ ( ) ( ) ( ) ( )
∑
( )
( )
( )
68
STT102 INTRODUCTORY STATISTICS
5.4 Conclusion
In this unit, you saw that the correlation coefficient is a useful measure
of the degree of association between two (or more) variables but this is
valid only when a straight line adequately describes this relationship.
You also learned that the error of this estimation may be large even
when the correlation is high. You also saw that evidence of association
is not necessarily that of causation and that influence of other factors
needs be taken into consideration so as to significantly interpret
correlation coefficients.
5.5 Summary
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
70
STT102 INTRODUCTORY STATISTICS
Introduction
Unit 1 Regression
Unit 2 Simple Concepts of Probability
Unit 3 Relationship between Population and Sample
Unit 4 Normal Distribution
Unit 5 Sampling Distribution of the Mean and the Central Limit
Theorem
UNIT 1 REGRESSION
Unit Structure
1.1 Introduction
1.2 Intended Learning Outcomes (ILOs)
1.3 Main Content
3.1 Linear Equation with one Independent Variable
3.2 Intercept and Slope
3.3 Graphical Interpretation of Slope
3.4 Regression Equation
3.5 Precaution on the use of Linear Regression
1.4 Self-Assessment Exercise(s)
1.5 Conclusion
1.6 Summary
1.7 References/Further Readings
1.1 Introduction
In the previous study unit, you learned one of the most commonly used
method for examining the relationship between two or more variables.
71
STT102 INTRODUCTORY STATISTICS
This unit is concerned with a topic closely related to the one discussed in
the previous unit. Here you will learn the statistical technique of or
predicting the value of one variable given the value of a second variable.
This technique is referred to as regression analysis.
You will need to review linear equations with one independent variable
in order to understand linear regression.
The first step in this direction is for you to observe that the general form
of a linear equation with one independent variable is given by
The next step is to see that when the graph of a linear equation is
displayed, you will obtain a straight line. We draw the graphs of the
following three linear equations (see Figure 11.1 below) to illustrate this
concept:
x -4 -3 -2 -1 0 1 2 3 4
y 0 1 2 3 4 5 6 7 8
72
STT102 INTRODUCTORY STATISTICS
x -4 -3 -2 -1 0 1 2 3 4
Y 11 9 7 5 3 1 -1 -3 -5
X -4 -3 -2 -1 0 1 2 3 4
Y -13 - -7 -4 -1 2 5 8 11
10
Then plot the points given in the tables and join the points with smooth
straight lines.
Example 1.1
(b) Interpret the y-intercept and slope in terms of the graph of the
equation.
Answer
x 0 1 2 3 4 5
y 25 45 65 85 105 125
You need to know that a straight line is determined by any two distinct
points that lie on the line. An implication of this is that you need to
substitute two different -values into the equation to get two distinct
points and then you connect those two points with a straight line.
Another point you need to notice is that the straight-line graph of the
linear equation
slopes upward if , slopes downward if
and is horizontal if as shown in Figure 11.3 below.
74
STT102 INTRODUCTORY STATISTICS
Since you could draw many straight lines though the cluster of data
points, you need a method to choose the best-fitting line. The statistical
procedure for finding this line of best fit is called the method of least
squares and the line so obtained is called the regression line. The
equation of the line is referred to as regression equation. The formal
procedure for deriving the method of least squares is beyond the scope
of this course.
Example 1.2
75
STT102 INTRODUCTORY STATISTICS
∑ ∑
∑
∑ ∑
Answer
( )
( )
( )( )
76
STT102 INTRODUCTORY STATISTICS
We now have
̇
̇
( ) ( )
You should note from the example above that in the context of
regression analysis that in equation
̇ ̇ ̇
̅
y is called the response variable while x is referred to as the predictor or
explanatory variable. This is because x is used to predict or explain the
values of the response variable.
You have seen that the concept behind finding a regression line is based
on the assumption that the data points are scattered about a straight line.
But in some cases, data points may be scattered about a curve instead of
a straight line.
In such cases, techniques are available for fitting curves to data points
showing a curved pattern. These techniques involve curvilinear
regression.
In conclusion, there is the need for you to evaluate the sample regression
line so as to determine if adequately describes the relationship between
the variables and .
You will accomplish this through tests of hypothesis on the true slope of
the line. Again, this discussion is delayed until we discuss some basic
77
STT102 INTRODUCTORY STATISTICS
1.5 Conclusion
In this unit, one other significant and useful measure of the degree of
association between two characteristics of a population was discussed.
You have learned that this relationship is valid only when it is
adequately described by a straight line.
You also learned that the equation to this line is called the regression
equation and that the equation allows the value of one characteristic to
be estimated when the value of another characteristic is known.
78
STT102 INTRODUCTORY STATISTICS
1.6 Summary
Knapp, R.G (1985). Basic Statistics for nurses, 2nd Edn., Delmar
Publishers Inc. N.Y.
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
79
STT102 INTRODUCTORY STATISTICS
Unit Structure
2.1 Introduction
2.2 Intended Learning Outcomes (ILOs)
2.3 Main Content
2.3.1 Classical Probability
2.3.2 Meaning of Probability
2.3.3 Equal-Likelihood Model
2.3.4 Probabilities and Percentages
2.3.5 Basic Properties of Probability events Notation and
Graphical Displays of Events
2.2.3.1Relationships among Events
2.2.3.2Mutually Exclusive Events
2.3.6 Some Rules of Probability
2.3.7 Conditional Probabilities
2.3.8 Conditional Rule
2.3.9 Multiplication Rule
2.3.10 Statistics Independence
2.4 Self-Assessment Exercise(s)
2.5 Conclusion
2.6 Summary
2.7 References/Further Readings
2.1 Introduction
Up to this point and in the previous study units, we have been dwelling
in the realm of descriptive statistics. This concerns mainly the
techniques of organizing and summarizing data.
You will gradually learn how to apply the basic principles of probability
theory to solve problems in inferential statistics in later units with the
foundation developed in this unit. We will now give a more lucid
definition of probability in the next part of the study unit.
One basic term here is an event. You need to understand that an event is
some specified result, which has the likelihood of occurrence or not in
an experiment. For instance, when you toss a coin, the occurrence of a
head or tail is likely to occur or not.
81
STT102 INTRODUCTORY STATISTICS
Example 1.1
Answer
The event that student chosen is 20 years old can occur in seven ways
because there are only seven students in the class who are 20 years old,
hence for the event. Therefore, the probability that the student is
20 years old is
You need to note that the procedure you have learned for computing
probabilities is applicable to experiments with possible outcomes
equally likely to occur. If this is not the case, you must use other
methods to determine probabilities.
Events
You have learned to use the word event rather intuitively. The precise
definition in probability is that an event consists of a collection of
outcomes.
You will notice that several and distinct events can be attached to this
card-selection process. 12.3.1 Notation and Graphical Displays for
Events.
You should observe that to each event E there exists another event E’
(complement of E) which is the likelihood of E not occurring. In
addition, you need to be aware that with any two events A and B, the
likelihood of A or B occurring (i.e., A L.) B) exist.
These three new events arising from events, A and B can be illustrated
by the following Venn Diagram:
83
STT102 INTRODUCTORY STATISTICS
Two or more events are said to be mutually exclusive if the two of them
cannot together occur when an experiment is performed.
A notation P(A) is used to represent the probability that the event (A)
that the die comes up even i.e. you can add P(A)=0.5 as the probability
of event A occurring is 0.5
In summary, if E is an event, then P(E) stands for the probability that
event E occurs.
84
STT102 INTRODUCTORY STATISTICS
Example 1.2
Answer
As you can observe from Table 12.1, the event of treatment of cancer in
this hospital can be represented by . Events and are mutually
exclusive and so by the special addition rule we have
( ) ( ) ( ) ( )
85
STT102 INTRODUCTORY STATISTICS
The special addition rule is valid only for events that are mutually
exclusive. In the case of events that are not mutually exclusive you will
need to use a different rule. This is the general addition rule, which
states that:
If A and B are any two events, then
( ) ( ) ( ) ( )
i.e. the probability that either event A or event B occurs equals the
probability that event A occurs plus the probability that event B occurs
minus the probability that both occur. We will illustrate this rule with
the following example:
Example 1.3
Answer
Suppose
N=event the patient admitted is male.
E=event the patient admitted is under 18
The event that the patient admitted is either male or under 18 can be
represented by (N or E). We wish to find (N and E)
( ) ( )
( )
i.e. 93% of those admitted in 1999 were either male or under 18.
86
STT102 INTRODUCTORY STATISTICS
You may also understand this concept in the following manner. Let A
and B be events, then the probability that B occurs given that A has
occurred is said to be a conditional probability. It is denoted by ( ).
We may illustrate the concept with the following example.
Example 1.4
( )
( )
In summary, if A and B are any two events, then
( )
P (B/A) =
(
This rule states that the conditional probability that event B occurs given
that event A has occurred is equal to the joint probability of events A
and B divided by the probability of event A
In the previous part of this unit, you saw that the conditional probability
rule is used for computing conditional probabilities in terms of
unconditional probabilities i.e.
( )
( ) =
( )
If both sides of this formula is multiplied by P(A), you will obtain a
formula for computing joint probabilities in terms of marginal and
conditional probabilities i.e.
( ) ( ) ( )
For any two events A and B
This formula is referred to as the general multiplication rule. It states
that the probability that both event A and event B occur equals the
probability that event A occurs times the probability that event B occurs
given the event A has occurred.
87
STT102 INTRODUCTORY STATISTICS
Example 1.5
Answer
You should observe that the unconditional probability that event B
occurs equals
( )
88
STT102 INTRODUCTORY STATISTICS
Exercise 2.1
2.5 Conclusion
In this unit, you have learned the simple concepts of probability theory.
This is the foundation of future study of inferential statistics.
2.6 Summary
89
STT102 INTRODUCTORY STATISTICS
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
90
STT102 INTRODUCTORY STATISTICS
Unit Structure
3.0 Introduction
3.1 Intended Learning Outcomes (ILOs)
3.2 Main Content
3.2.1 Discussion on Population and Samples
3.2.2 Parameters and Statistics
3.2.3 Types of Distribution
3.3 Self-Assessment Exercise(s)
3.4 Conclusion
3.5 Summary
3.6 References/Further Readings
3.0 Introduction
In the earlier part of this course, you came across some numerical
measures that are usually employed in descriptive statistics. You will
now recall that the concepts of probability learned in the last unit are
said to be fundamental to the second aspect (i.e. methods of inferential
statistics) which we are about to begin in this unit.
91
STT102 INTRODUCTORY STATISTICS
You will understand these terms more suitably with the following
illustration:
Example 3.1
After the distinction between the two types of population, your next step
is to generalize your results on the population sampled to the target
population. You need to be cautious at this stage because the
generalization may be open to controversy.
In view of this, you must be assured that the characteristics of both the
population sampled and the target populations are identical. Recall that
in a previous study unit it was mentioned that a statistical study of an
entire population is usually impossible because of the following reasons:
92
STT102 INTRODUCTORY STATISTICS
You will also recall from units 2 and 3 that the aim of selecting a sample
is to ensure that the observations are unbiased. This is done through
random sampling. Remember that we have defined and examined these
procedures in the aforementioned study units.
Another important point you have to note in the conduct of a research is,
that the value of the population parameters is virtually unknown since a
part of the population is observed.
93
STT102 INTRODUCTORY STATISTICS
On the other hand, the frequency for all the measurements in an entire
population is said to be a theoretical distribution. Again, a theoretical
distribution is rarely obtainable in practice because it is impractical to
measure all elements in the population. Hence an empirical frequency
distribution for the sample measurements approximates the theoretical
frequency distribution of all population measurements.
In the next unit, you will learn the properties of one of the most
frequently used continuous theoretical frequency distributions.
3.4 Conclusion
In this unit, you have learned about the relationship between populations
and samples. You also learned that in drawing conclusions about a
94
STT102 INTRODUCTORY STATISTICS
3.5 Summary
95
STT102 INTRODUCTORY STATISTICS
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
96
STT102 INTRODUCTORY STATISTICS
Unit Structure
4.0 Introduction
4.1 Intended Learning Outcomes (ILOs)
4.2 Main Content
4.2.1 Discussion on the Normal Curve
Properties of the Normal Curve
4.2.2 Standard Normal Curve
4.2.3 Using the Standard Normal Table
4.2.4 Find areas under Normal Curve using the Standard
Normal Table
4.3 Self-Assessment Exercise(s)
4.4 Conclusion
4.5 Summary
4.6 References/Further Readings
4.0 Introduction
In this unit, you will learn the use of properties and applications of the
most important continuous probability distribution. This is the so-called
bell-shaped curve or the normal distribution.
97
STT102 INTRODUCTORY STATISTICS
From previous units, you can recall that the center points or location of a
normal curve depends on the value of the mean and the variability of
observations depends on the value of the standard deviation. You will
now acquaint yourself with other properties of the normal curve as
stated in the next part of the unit as well in the accompanying diagrams.
0.95 2.95 4.95 6.95 8.95 10 10.95 12.95 14.95 16.95 18.95
98
STT102 INTRODUCTORY STATISTICS
You will also notice that the center point of a normal curve depends on
the values of the mean and that the variability of the values depends on
the values of the standard deviation. For distribution of measurements
with a small standard deviation, the curve is tall and thin while for
distributions with large standard deviation, the curve is short and fat (see
figure 14.2)
You will also observe that the normal curves of two populations with the
same means but different standard deviations have different shapes as
shown in figure 14.3
99
STT102 INTRODUCTORY STATISTICS
Figure 4.3: illustration of Normal Curves with the same Mean but
Different Standard Deviation
Such areas of the smooth curve are obtainable from a table of normal
distribution values given at the end of this study unit. The procedure will
be illustrated in the next part of this unit.
There are infinitely many normal curves. One particular normal curve is
a standard normal curve of z-curve. The areas under normal curves can
be found once we know how to determine areas under the standard
normal curve.
The horizontal axis under the standard normal curve is labeled with the
letter z. the standard normal curve is shown in figure 14.4 and some of
its more important properties are:
(i) The total area under the standard normal curve is equal to 1
(ii) The standard normal curve extends indefinitely in both direction
approaching but never touching the horizontal axis.
(iii) The standard normal curve is symmetric about 0.
(iv) Most of the areas under the standard normal curve lies between -3
and 3
100
STT102 INTRODUCTORY STATISTICS
Tables of areas under the standard normal curves have been constructed
because of the importance. This table is at the end of this unit and it
consists of four decimal place numbers in the body of the table. These
numbers give the areas under the standard normal curve and lies to the
left of a given value of . the left page of the table is for negative values
of while the right page is for positive values of .
The following example explains its use:
Example 4.1
Find the area under the standard normal curve that lies to the left of 1.32
Answer
101
STT102 INTRODUCTORY STATISTICS
Figure 4.5 determining the area under the standard normal curve to the
left of 1.32
From the table on the right (1.32 is positive) you go to left hand column
labeled z to 1.3. then from there go across that row until you are under
0.02 in the top row. The number in the body of the table there is 0.9066.
This is the area (shaded in the diagram) under standard normal curve
that lies to the left of 1.32.
You have just seen one use of the Standard Normal Table. Two other
important uses of that table are finding the area to the right of a given
value of z and finding the area between two given values of z. we
illustrate these other uses of the tables in the following examples:
Example 4.2
Find the area under the standard normal curve that lies to the right of
0.85
Answer
Figure 4.6 determining the area under the standard normal curve to the
right of0.85
Since the total area under the standard normal curve is 1, the area to the
right of 0.85 equals 1 minus the area to the left of 0.85. You will go
down the left-hand column labeled z to 0.8. Next, go across that row
until you are under 0.05 in the top row. The number in the body of the
table there is 0.8023. This is the area under the standard normal curve
that lies to the left of 0.85. Hence the area under the standard normal
curve that lies to the right of 0.85 is 1-0.8023=0.1977.
102
STT102 INTRODUCTORY STATISTICS
Example 4.3
Find the area under the standard normal curve that lies between -0.57
and 1.63.
Figure 4.7: Determining the area under the standard normal curve that
lies between -0.57 and 1.63
The area under the standard normal curve that lies between -0.57 and
1.63 equals the area to the left of 1.63 minus the area to the left of -0.57.
the area to the left of 1.63 is 0.9484 and that to the left of -0.57 is 0.2843
(you will obtain this area from the left page of the table since -0.57 is
negative). The area under the standard normal curve that lies between -
0.57 and 1.63 is 0.9484-0.2843=0.6641.
You can also find the z-value(s) corresponding to a specified area under
that standard normal curve by simply reversing the steps taken in
examples 14.
You will also recall that two normal curves that have the same µ
parameter are centered at the same place while two normal curves that
have the same µ parameter will have the same shape.
Example 4.4
Determine the area under the normal curve with parameters µ=5 and σ =
2 that lies
(i) To right of 6;
(ii) Between 2 and 7.
Answer
(i) The first step is for you to sketch the normal curve with
parameters, µ=5 and σ =2 and shade the area to the right of 6.
You will label the horizontal axis for the normal curve as and
that for the standard normal curve as .
Figure 4.8
(a) Area under the normal curve with parameters, µ=5 and σ =2 that
lies to the right of 6.
(b) Area under the standard normal curve that lies to the right of 0.5
You will now obtain the area under the normal curve by
converting x-values to z-values by first subtracting µ and
dividing σ. This conversion process is referred to as
standardizing. You then obtain in (i)
i.e. areas in the shaded portion in 14.8(a) and 14.8(b) are equal. The area
shaded in figure 14.8(b) is 1 - 0.6915 = 0.3085. Hence the area shaded in
14.8(a) is also 0.3085. This is the area under the normal curve with
parameters µ=5 and σ =2 that lies to the right of 6 is 0.3085.
104
STT102 INTRODUCTORY STATISTICS
(ii) In this case you need to find the area under the normal curve with
parameter µ=5 and σ =2 that lies between 2 and 7 as shown in
figure 14.9(a)
Figure 4.9
(a) Area under the normal curve with parameters, µ=5 and σ =2 that
lies between 2 and 7.
(b) Area under the normal curve with parameters, µ=5 and σ =2 that
lies between x = 2 and x = 7 is equal to the area under the
standard normal curve that lies between
and
Using the table, the shaded area in Fig.14.9 (b) equals 0.8413 – 0.0668 =
0.7745
In summary, you will determine the area under the normal curve with
parameters µ and σ that lies between a and b as being equal to the area
under the standard normal curve that lies between
( ) ( )
and
Also you will find the area under the normal curve with parameters µ
and σ that lies to the right (or left) of a particular x-value by first
converting to the z-score and then using the Standard Normal table to
find the area under the standard normal curve that lies to the right (or
left) of the z-score.
4.4 Conclusion
4.5 Summary
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
Unit Structure
5.0 Introduction
5.1 Intended Learning Outcomes (ILOs)
5.2 Main Content
5.2.1 Sampling Error: The Need for Sampling Distribution
5.2.2 Derived Distribution-Generation
5.2.3 Properties of the Distribution of Sample Means
5.2.4 Central Limit Theorem
5.3 Self-Assessment Exercise(s)
5.4 Conclusion
5.5 Summary
5.6 References/Further Readings
5.0 Introduction
107
STT102 INTRODUCTORY STATISTICS
However, you are aware that since a sample from a population provides
data for only a portion of the entire population, it is unlikely the sample
yields perfectly accurate information about the population.
Example 5.1
109
STT102 INTRODUCTORY STATISTICS
possible sample means of size 625 from the given population of systolic
blood pressure measurements.
Answer
5.4 Conclusion
In this unit, you studied one of the most important ground work of
statistical inference. This is the sampling distribution of the mean. Its
properties were suitably explained to you of which results in the Central
Limit Theorem was applied to a problem.
110
STT102 INTRODUCTORY STATISTICS
5.5 Summary
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
111
STT102 INTRODUCTORY STATISTICS
Introduction
1.0 Introduction
1.1 Intended Learning Outcomes (ILOs)
1.2 Main Content
1.2.1 Estimating a Population Mean
1.2.2 Interpretation of Confidence Intervals on µ
1.2.2.1 Confidence Interval for one Population Mean when
σ is Unknown
1.2.2.2 Obtaining Confidence Intervals for a
Population Mean
1.2.2.3 Unknown Confidence Intervals on the
Difference in Two
1.3 Self-Assessment Exercise(s)
1.4 Conclusion
1.5 Summary
1.6 References/Further Readings
1.0 Introduction
In this unit, you will begin the study of inferential statistics. This
consists of a set of procedures used to draw conclusions about a large
112
STT102 INTRODUCTORY STATISTICS
There are two main areas of statistical inference namely estimation and
testing of hypothesis. In this unit you will examine methods for
estimating the mean of a population.
113
STT102 INTRODUCTORY STATISTICS
You need to recall from the last study unit that it is unreasonable to
expect that a sample mean will exactly equal the population mean µ.
You should anticipate some sampling error. It is therefore necessary that
in addition to reporting a point estimate of µ, you need to furnish
information on the accuracy of the estimate.
You will now learn a technique for obtaining a confidence interval for a
population means. Before proceeding, you will need to recall the
following Key Facts:
114
STT102 INTRODUCTORY STATISTICS
Example 1.1
Answer
Stated in words, the last equation means that the probability is 0.9544
that will lie within of µ. It also means that u will lie within
√
of .
√
You need to realize that the random variable is not µ. In addition, you
should note that the population mean µ is a fixed number, although it
may be unknown, the sample size is a random variable and its value
depends on chance i.e. on which the sample is obtained
(ii). One interpretation is that about 95.44% of all samples of size n have
the property that the interval with endpoints contains the
√
population mean µ. Another interpretation is that if you take a large
115
STT102 INTRODUCTORY STATISTICS
√
The standardized random variable
√
Has the standard normal distribution.
But in practice, the population standard deviation is unknown and as
such you cannot base your confidence-interval procedure on (and
thus z). However, since the sample standard deviation is a point
estimate of the population standard deviation, σ by in
√
and base your confidence-interval method on the resulting random
variable. Your next task is to identify the probability distribution of this
new random variable i.e.
116
STT102 INTRODUCTORY STATISTICS
In carrying out this task, you will assume that the population being
sampled is normally distributed. The obvious problem with the
substitution of for σ is that in addition to the variability of now
varies with each sample. This additional variability in is taken into
account by use of the Student -distribution.
Example 1.2
For t-curve with 13 degrees of freedom, determine t0.025 i.e. find the t-
value having area 0.02 to its right.
Answer
To find the t-value in question we use the Table whose portion has been
repeated here for reference:
Table 16 Value of ta
df t0.10 t0.05 t0.025 t0.01 t0.005 Df
- -
- -
12 1.356 1.782 2.179 2.681 3.055 12
13 1.350 1.771 2.160 2.650 3.012 13
14 1.345 1.761 2.145 2.624 2.977 14
15 - - - - - -
117
STT102 INTRODUCTORY STATISTICS
Having discussed t-distribution and t-curves, you can now see how to
develop a method to obtain a confidence interval for a population mean,
µ when a population being sampled is normally distributed and the
population standard deviation is unknown. The procedure is as follows:
Let a random sample of size n be taken from a normal distributed
population with mean µ. Then the random variable.
̅
√
Has the t-distribution with (n-1) degrees of freedom i.e probabilities for
that random variable are equal to areas under the t-curve with df=n-1.
Therefore,
̅
( ⁄ ⁄ )
⁄√
This equation may be rewritten as
( ̅ ⁄ ̅ ⁄ )
√ √
From whence you have that once the sample is taken, the interval from
̅ ⁄ ̅ ⁄
√ √
Will be a (1-a) – level confidence interval for
Example 1.3
Obtain a 95% confidence interval for the mean gestation period of the
domestic dogs. [this example is taken from Elementary Statistics by
N.A. Weiss]
Answer
For a confidence level of 1-a, you will use the t-table to find ⁄ with df
= n-1, where n is the sample size.
118
STT102 INTRODUCTORY STATISTICS
You can then be 95% confident that the mean gestation period µ of the
domestic dog is somewhere between 60.19 to 61.18 days.
In the previous section of the study unit, you saw the rationale and the
method ascribed to a confidence interval on a single population mean µ.
In medical and nursing practice, it is usual to estimate the true difference
in two population means µ1 - µ2, where µ1 and µ2 are the respectively
true means for the first and second populations.
You need to be aware that with a rigorous proof, it can be assumed that
if possible samples of size n, are drawn from the first population of size
N2 and all possible samples of size n2 are drawn from the second
population of size N2 such that from these samples all possible
differences ̅ ̅ are computed, then the frequency distribution of
these differences is that of the normal distribution with mean µ1 - µ2 and
standard deviation
√( ) ( )
119
STT102 INTRODUCTORY STATISTICS
√( ) ( )
You should note that if the two sampled populations are not normally
distributed or if the form of the frequency distributions of the
populations is unknown, the sampling distribution of the differences,
̅ ̅ is at least approximately normally distributed with mean µ1 - µ2
and standard deviation √( ) ( ) for large sample sizes n1
and n2.
(̅ ̅ ) √
(̅ ̅ ) √
When interpreted, you may be 95% (or 99%) confident that the interval
spans the true differences in population means µ1 - µ2.
1.4 Conclusion
In this unit, you learned that having observed a mean, or any other
numerical summary that may be computed from a sample of
observations, it is possible to give limits within which the corresponding
value in the population lies, with a known degree of confidence.
120
STT102 INTRODUCTORY STATISTICS
1.5 Summary
(̅ ̅ ) √
121
STT102 INTRODUCTORY STATISTICS
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
122
STT102 INTRODUCTORY STATISTICS
Unit Structure
2.0 Introduction
2.1 Intended Learning Outcomes (ILOs)
2.2 Main Content
2.2.1 Nature of Hypothesis Testing
2.2.2 Choosing the Hypothesis
2.2.3 Logic of Hypothesis Testing
2.2.3.1 Terms, Errors and Hypothesis
2.2.3.2 Type Land Type ii Errors
2.2.3.3 Probabilities of Type Land Type ii Errors
2.2.4 Possible conclusion for a Hypothesis Test
2.3 Self-Assessment Exercise(s)
2.4 Conclusion
2.5 Summary
2.6 References/Further Readings
2.0 Introduction
This study unit and the two succeeding units will focus on how you can
employ the sample means to make decisions about the hypothesized
values of a population mean µ. For instance, you might wish to use the
mean success of treatment of tuberculosis for a sample of people to
penultimate year to generalize for all such people last year. Such
statistical inference lies in the realm of hypothesis test.
This study unit will discuss the fundamentals of hypothesis test while
you will learn standard procedures employing in hypothesis testing in
the next two study unit. You will need to examine the objectives for this
unit stated hereunder.
123
STT102 INTRODUCTORY STATISTICS
Your first step in a hypothesis test is to decide on the null and the
alternative hypothesis. Suppose your hypothesis tests is specifically for
one population mean µ.
Example 2.1
Suppose a clinical investigator wishes to determine whether infants with
Disorder X have the same birth weight as normal infants. The mean birth
weight of all normal infants is 7 pounds. For a random sample of 25
infants, suppose the mean birth weight is calculated to be 6.2 pounds.
(i) Determine the null hypothesis for the hypothesis test.
124
STT102 INTRODUCTORY STATISTICS
You have learned how to choose suitable null and alternative hypotheses
for a hypothesis test. The next step is how to decide on acceptance or
rejection of the null hypothesis in favor of the alternative hypothesis.
To do this you need to take a random sample from the population and
determine the consistency or otherwise of the sample data with the null
hypothesis. If the sample data are consistent with the null hypothesis,
the null hypothesis is accepted. On the other hand, if the sample data are
inconsistent with the null hypothesis, the null hypothesis is rejected in
favor of the alternative hypothesis.
The next example gives a precise criterion for deciding whether or not to
reject the null hypothesis.
Example 2.2
Answer
(i)
Test Statistic
125
STT102 INTRODUCTORY STATISTICS
√ √
Where =3860, µ = 140, , σ = 25
(ii) The sample mean is 7.2, standard deviations above the null
hypothesis mean of 140. (The assumption that the random
variable is normally distributed has been made).
(iii) We conclude that the null hypothesis be rejected in favor of the
alternative hypothesis
√
This is called the test statistic for the hypothesis test. The following
graph portrays the criterion used to decide on the rejection or otherwise
of the null hypothesis:
Fig 2.1
The set of values for the test statistic that leads to rejection of the null
hypothesis is called the rejection region.
In this case, the rejection region consists of all z-values that lie either to
the left of or the right of that part of the horizontal
axis under the shaded area in Figure 17.1
126
STT102 INTRODUCTORY STATISTICS
The set of values for the test statistic that does not lead to the rejection
of the null hypothesis is called the non-rejection or acceptance region.
The values of the test statistic that separate the rejection and non-
rejection region are called the critical values. In the example, the critical
values are ( ), as shown in Figure 17.2 below:
Fig 2.2
You need to be aware that for
(i) A two-tailed test, the null hypothesis will be rejected if the test
statistic is either too small or too large. It therefore follows that
the rejection region for such a test consists of two parts, one on
the left and one on the right.
(ii) A left-tailed test, the null hypothesis will be rejected only if the
test statistic is too small. Thus, the rejection region for such a test
consists of the part on the left.
(iii) A right-tailed test, the null hypothesis will be rejected only if the
test statistic is too large. Thus, the rejection region for such a test
consists of that part on the right.
127
STT102 INTRODUCTORY STATISTICS
True False
Accept Correct Decision Type II error
Reject Type I error Correct Decision
You need to notice that type I error result from rejecting the null
hypothesis when it is infected true; while type II error results from not
rejecting the null hypothesis when it is infected false.
Another point you must also observe is that the null and the alternative
hypotheses in a hypothesis test are exhaustive. That is, if the null
hypothesis is false the alternative hypothesis is true and vice versa.
From the points raised in 17.3.1, you will notice that the probability of
making type I error is that of rejecting a true null hypothesis, that is, the
probability that the test statistic will be in the rejecting region if indeed
the null hypothesis is true.
It is important that you equate this probability to that of the test statistic
being in the non-rejection region if indeed the null hypothesis is false.
This is denoted by fit and it depends on the true value of the population
mean µ.
128
STT102 INTRODUCTORY STATISTICS
(ii) If the hypothesis is not rejected, you can conclude that the data do
not provide sufficient evidence to support the alternative
hypothesis.
The critical concepts learned in this study unit include the following:
1. Statistical inference
2. Hypothesis testing
3. Test statistic
4. Null and alternative hypotheses
5. Rejection and non-rejection region
6. Critical values
7. Level of significance
8. Type I and II errors
2.4 Conclusion
129
STT102 INTRODUCTORY STATISTICS
2.5 Summary
1. The Ministry of Health’s Food and Drug Agency states that the
recommended daily allowance (RDA) of iron for adult females
under the age of 50 is 18 mg. a hypothesis test is to be performed
to decide whether adult females under the age of 50 are, on the
average, getting less than RDA of 18 mg of iron.
(i) Determine the null and alternative hypothesis for the hypothesis
test.
(ii) Classify the hypothesis test as two-tailed, left-tailed or right-
tailed.
1. Use the information in exercise 17.6.2 to explain what each of the
following would mean-
(i) Type I error
(ii) Type II error
(iii) Correct decision
2. Decide whether each of the following statements is true or false.
Explain your answer.
(i) If it is important not to reject a true null hypothesis, then, the
hypothesis test should be performed at a small significance level.
(ii) For a fixed sample size, a decrease in the significance level of a
hypothesis test results in an increase in the probability of making
type II error.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
130
STT102 INTRODUCTORY STATISTICS
Unit Structure
3.0 Introduction
3.1 Intended Learning Outcomes (ILOs)
3.2 Main Content
3.2.1 Discussion
3.2.2 Obtaining the Critical Value(s) for a Specific Significance
Level
3.2.3 Procedure for a Hypothesis Test for a Population Mean
when σ is known
3.3 Self-Assessment Exercise(s)
3.4 Conclusion
3.5 Summary
3.6 References/Further Readings
3.0 Introduction
In this study unit, you will learn a simple method for performing a
hypothesis test for a population mean µ, when the standard deviation of
the population, σ, is known.
3.2.1 Discussion
You need to be acquainted with how to obtain the critical value(s) for a
hypothesis test when the significance level, α, is specified a prior.
You will recall that the significance level, α, of a hypothesis test is the
probability of making a Type I error which is the same as the probability
of rejecting a true null hypothesis. An equivalent statement is that α is
the probability of the test statistic lying in the rejection region if indeed,
the null hypothesis is true.
131
STT102 INTRODUCTORY STATISTICS
It therefore follows that this is the key to determine the critical values
for a specified significance level, which you will learn in what follows.
You will learn this procedure starting that the following key point:
If a hypothesis test is to be carried out at a significance level, α, then the
critical values are chosen such that for a true null hypothesis, the
probability equals α so that the test statistic lies in the rejection region.
Null hypothesis for the test in regard to one population mean, µ is of the
form
H0 : µ = µ0(µ0 is a number)
The statistic for the test is
Example 3.1
Determine the critical value(s) for a (i) two-tailed test (ii) left-tailed test
(iii) right-tailed test.
132
STT102 INTRODUCTORY STATISTICS
Answer
For α = 0.05, you will choose the critical value(s), with the area under
the standard normal curve lying above the rejection region equal to 0.05.
(i) The rejection region for a two-tailed test is on both the left and
right. Hence, for a test with α = 0.05, the z-values that divide the
area under the standard normal curve into a middle 0.95 area and
two outside areas of 0.025 are the required critical values. These
are which we find from the tables to be .
(ii) For a left-tailed test, the rejection region is on the left. Hence, for
a test with α = 0.05, = from the tables.
(iii) For a right-tailed test, the rejection region is on the right. Thus,
for a test with α = 0.05, = from the tables.
(iv)
These regions are depicted below in Fig 3.1
and
You need to note that the most frequently used significance levels are
0.10, 0.05 and 0.01. you will now learn the procedure for performing a
hypothesis test for a population mean when the population standard
deviation is specified.
133
STT102 INTRODUCTORY STATISTICS
√
Step 5: If value of test statistic lies in the rejection region, reject H0;
otherwise, do not reject H0.
Step 6: State the conclusion in words.
The following example illustrates the procedure outlined.
Example 3.2
Answer
√ √
Step 5: if the value of the test statistic falls in the rejection region, reject
H0 .
134
STT102 INTRODUCTORY STATISTICS
The value of the test statistic does not fall in the rejection region and so
we do not reject H0.
Step 6: State the Conclusion
The test results are not statistically significant at the 5% level. i.e. at the
5% significant level, the sample of 18 calcium intakes does not provide
sufficient evidence to conclude that the mean calcium intake, p of all
people with incomes below the poverty level is less than the RDA of
800mg.
The Food and Agency Department of the Ministry of Health gives the
recommended daily allowance (RDA) of iron for adult females under 50
as 18mg. the following iron intakes, in milligrams, during a 24-hour
period were obtained for 45 randomly selected adult females under the
age of 50.
135
STT102 INTRODUCTORY STATISTICS
3.4 Conclusion
In this unit, you have learned the simple procedure for performing
hypothesis test for a population mean when the population standard
deviation is known.
3.5 Summary
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
136
STT102 INTRODUCTORY STATISTICS
Unit Structure
4.0 Introduction
4.1 Intended Learning Outcomes (ILOs)
4.2 Main Content
4.2.1 Classical Approach
4.2.2 Statistical Significance Versus Practical Significance
4.2.3 Relation between Hypothesis Tests and Confidence
Intervals
4.2.4 P-Values
4.2.5 P-Value Approach to Hypothesis Testing
4.3 Self-Assessment Exercise(s)
4.4 Conclusion
4.5 Summary
4.6 References/Further Readings
4.0 Introduction
This study unit concludes the discussion on hypothesis testing within the
scope of this course. It examines the limitations of the classical approach
and introduces the p-value approach to hypothesis testing. Let us first
look at the objectives stated hereunder:
You will recall the procedure used in performing a hypothesis test for a
population standard deviation, α is given. This statistical inference
though specific to a particular example, but it is important that you are
137
STT102 INTRODUCTORY STATISTICS
You will recall that the results of a hypothesis test are said to be
statistically significant if the null hypothesis is rejected at the chosen
level of α. The implication of this statement is that the data provided
evidence to conclude that the truth is different from that stated in the
null hypothesis.
You need to note that this does not necessarily mean that the difference
is important in any practical respect. In other words, statistically
significance does not imply practical or clinical significance.
4.2.4 P-Values
138
STT102 INTRODUCTORY STATISTICS
(i) It does not permit readers having access only to the conclusion of
the test to make their evaluation (i.e., select their own
significance level).
(ii) It does not provide them with the information necessary to access
precisely the strength of the evidence against the null hypothesis.
Example 4.1
139
STT102 INTRODUCTORY STATISTICS
Answer
Let µ denote the mean calcium intake (per day) of all people with
incomes below the poverty level.
√
µ0 = 800, = 180, n = 10
from the data,
∑
140
STT102 INTRODUCTORY STATISTICS
4.3 Conclusion
4.4 Summary
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
141
STT102 INTRODUCTORY STATISTICS
Unit Structure
5.0 Introduction
5.1 Intended Learning Outcomes (ILOs)
5.2 Main Content
5.2.1 Significant Problems of Morbidity Statistics
5.2.2 Sources of Morbidity Statistics
5.2.3 Rates of Morbidity Statistics
5.3 Self-Assessment Exercise(s)
5.4 Conclusion
5.5 Summary
5.6 References/Further Readings
5.0 Introduction
142
STT102 INTRODUCTORY STATISTICS
1. Whether you should add the number of persons ill or the illnesses
or both to the amount of morbidity.
2. Whether the number of new illnesses that arise in a given period
or number that were extent in that period be known.
3. What you intend to count as morbidity in any given
circumstances. Is a sickness congenital, acquired effect, injuries,
impairment or incipient diseases revealed by test e.g.,
tuberculosis or diabetes?
4. You need to know about carriers of a disease.
You need to know the frequent sources of morbidity statistics and the
significant problems arising in each. These are as follows:
1. The survey of sickness with a representative sample of a
population keeping a diary or being interviewed so as to reveal
the details of the sickness suffered over a defined preceding
interval of time.
2. Statistics of a general practitioner of patients attending surgery or
visited at home.
3. Hospital in-patient statistics provide a firm diagnosis.
6 Sickness absence records.
7 Notifications of diseases are frequently limited to infectious
diseases.
8 Registration of all cases of diseases provides information by
which sickness in population may be identified and measured.
143
STT102 INTRODUCTORY STATISTICS
You will decide for each of these classes on the measure of the number
of persons sick or the number of spells of illnesses that occur. The most
meaningful morbidity rates in the total population or at specific ages will
then turn out to be:
1. The incidence rate, which is the number of illnesses beginning
within a specified period of time and related to the average
number of persons exposed to risk during that period e.g., how
many persons fell sick with typhoid fever in the sixth week of the
year?
2. The period prevalence rate which is the number of illnesses
existing at any time within a specified period and related to the
average number of persons exposed to risk during that period
e.g., how many persons were sick with malaria during the month
of December?
3. The point prevalence rate which is the number of illnesses
existing at a specified period of time and related to the number of
persons exposed to risk at the point of time e.g. how many
persons were sick with cholera on 24 December.
144
STT102 INTRODUCTORY STATISTICS
5.4 Conclusion
5.5 Summary
The most usual measures of morbidity are the incidence and prevalence
rates.
Knapp, R.G (1985). Basic Statistics for nurses, 2nd edition, Delmar
Publishers Inc. N.Y.
Hill, A.B. and Hill, I.D. (1991). Principles of Medical Statistics, 12th
edition, E.Arnold, London.
145