Study Guide - STTN111PVEC2024
Study Guide - STTN111PVEC2024
No part of this study guide may be reproduced in any form or in any way without the written permission of the publishers.
It all starts here
• Ranked in the top 5% of universities globally by the QS-rankings
• Contributes the second largest number of graduates annually to the labour market
i
Module information
Module code STTN 111
Module credits 12
Name of lecturer(s)
Office telephone
Email address
Consulting hours
Note: It is your responsibility to ensure that you are registered for this course.
Word of welcoming
The Subject Group Statistics from the School of Mathematical and Statistical Sciences
welcomes you to this module. We hope that you will benefit from the interesting and
useful contents of the module – especially for your specific field of study.
Module rationale
Statistics can be used in almost every field imaginable. This module offers you the
opportunity to gain good general background knowledge on basic statistical principles
and methods as well as the basic practical skill of making sense of numbers. This is a
basic Statistics module required for most study fields. This module will be a
prerequisite for any subsequent Statistics module required for a specific study
direction. It would be prudent to complete this module prior to conducting a research
project.
Prerequisites
Statistics is not Mathematics, although numbers form its underlying basis. A basic
understanding of numbers and mathematics is an advantage for completing this
module. Practical exercises will be done on computer, so basic computer skills are also
an advantage.
ii
How to approach the module and use the study guide
It is very important to keep up with the course work right from the beginning, since the
study material is extensive. It is expected from the student to attend every contact
session.
In this module you will be expected to solve practical problems and interpret the
results. We, therefore, emphasise practical application rather than theoretical proof.
The best way of preparing for tests is to complete exercises in the textbook and
determine whether you have mastered each section.
If not, go through the work again to determine where you went wrong.
Senior students present weekly facilitation classes to assist you in mastering the work.
Do not hesitate to consult your lecturer on problems you are unable to solve.
Evaluation
Before a student will be admitted to the examination, he/she must submit proof that
he/she had been an active participant in the classes scheduled for the specific module.
This proof of participation (attending classes and participation in class discussions,
handing in of assignments, writing class tests, writing of the semester test, etc.) will
count towards the student’s participation mark.
Participation mark
Admission to the exam is subject to regular attendance of classes as well as a
participation mark of at least 40%.
Exam mark
To pass the examination you should obtain an examination mark of at least 40%.
iii
Note that you have two examination opportunities. No medical certificates will therefore
be accepted as an excuse for being absent during the final examination.
Action verbs
• Name
Facts are briefly given.
• Describe
A more in-depth testing of knowledge. Characteristics, facts or results are given
in a logical and well-structured manner. No comments or reasoning is necessary.
• Define
A clear and concise explanation of a concept providing a clear reflection of its
meaning.
• Explain
An issue is clarified in order for the reader to understand it. Make use of
illustrations, descriptions and examples as well as reasons to motive the results
and verdicts.
• Compare
This question should be answered with care. Do not discuss one issue and then
another. Facts, events or problems are set against each other to illustrate
similarities and differences.
• Discuss
This type of question assumes insight and discernment during application and
judgment. The different aspects of an issue or statement are examined in an
analytical way.
Lecturers will where necessary give more information on the use of these and other
terms in class. It is to your own advantage that you know what is expected of you if an
action verb (term) comes up in a test paper.
Module outcome
After the successful completion of the module, you should be able to:
• have a basic knowledge of the most important statistical techniques that are used
daily;
• interpret and apply the most important statistical techniques to simple problems;
and
• have a positive attitude towards Statistics.
iv
Icons
Assessment /
Study material
Assignments
Example Reflection
v
Study unit 1
Study unit 1
STATISTICS – INTRODUCTORY CONCEPTS
Study time
2 hours of lectures, 4 hours study and homework
Aim
In this study unit we will focus on why you need to take Statistics and the relevant
components of statistical calculations. Firstly, you should know which types of data
exist, since calculations differ between die different kinds of data. Lastly, you will
notice how much time you’ll save by using a computer program for assignments.
Study outcomes
Upon successful completion of the study unit, you should be able to:
• describe the aspects of statistics;
• illustrate with examples the importance of statistics for solving problems from
different sections of life;
• compare Descriptive Statistics with statistical inference;
• distinguish between discrete and continuous data;
• describe, with examples, the various scales which can be used to express the
different values that a variable can take on; and
• give the definition and examples of discrete and continuous data.
Study material
Chapter 1 of ESM
Individual activity
Explain the difference between statistical inference and descriptive statistics
1
Study unit 2
Study unit 2
SAMPLING
Study time
2 hours of lectures, 4 hours study and homework
Aim
In the previous study unit we saw that statistics plays a role in the collection of data as
part of the research process. In this study unit we will focus on the correct methods of
data collection. We will look at why it is not always possible to work with complete
census taking, but rather make use of smaller samples. We will also look at ways to
reduce/avoid sampling errors. Then we will learn how to practically draw different types
of samples and the advantages and disadvantages of each method. Lastly, we will
focus on questionnaires as a form of data collection when physical measures are not
possible.
Study outcomes
Upon successful completion of the study unit, you should be able to:
• motivate sampling as a scientific method;
differentiate between the concepts population and sample (See Section
2.1); and
discuss the advantages of sampling as opposed to a census (see Section
2.2).
• describe and apply a number of techniques, used to take representative
samples, on simple examples;
distinguish between random (probability) samples and non-probability
samples;
discuss random sampling procedures and list the different sampling
methods that fall under probability procedures;
discuss non-probability sampling procedures and list the different
sampling methods that fall under non-probability procedures;
explain why a particular method within a specific situation will lead to the
best result; and
2
Study unit 2
Study material
Chapter 2 in ESM
Individual activity
A random sample is drawn from a group of 623 people. The following sequence of
random numbers is used to OBTAIN THE INDICES OF THE PEOPLE TO BE
SELECTED.
12 73 45 94 30 04 01 77 14 34 52 19 31 47 56
The number of the fifth person chosen is:
1. 17
2. 219
3. 30
4. 714
5. 345
Answer: Number 219, since 943 and 714 are too large and the number 345 is
repeated.
Work through Examples 2.4. 2.6 and 2.7 in ESM as if it is self-evaluation exercises.
Harry and Sally, 5th year graphic design students, want to conduct a study to
determine the hostel residents’ opinions on who will be the 2005 Rag float winners:
There are 3000 students staying in the 15 hostels on campus. Harry and Sally
decide to create a sample of 10 students from the possible 3000. They proceed as
follows:
They select the first first-year student they come across and send him/her
to a randomly chosen hostel.
The first-year student’s assignment is then to ask the first 10 residents of
that hostel who they think will be the Rag float winners for 2005.
1. Discuss, with references to Harry and Sally’s study, the 3 errors which can occur
when sampling.
2. Suggest a solution to each of these 3 errors.
3. What type of data is the first-year student collecting?
3
Study unit 3
Study unit 3
FREQUENCY DISTRIBUTIONS AND GRAPHICAL PRESENTATION
OF DATA
Study time
4 hours of lectures, 10 hours study and homework
Aim
In this study unit you will learn how to summarise data and present it so that it is clear
at first glance. We are referring to tables and graphs that are applicable to two different
types of data (discrete and continuous). Tables and graphs are used to simplify data.
Study outcomes
Upon successful completion of the study unit, you should be able to:
• tabulate data;
scientifically draw up a frequency table of continuous data;
construct cumulative frequency tables;
construct relative frequency tables as well as relative cumulative
frequency tables;
scientifically draw up a frequency table of discrete data;
scientifically draw up a cumulative frequency table of discrete data;
scientifically draw up a relative frequency table and relative cumulative
frequency table of discrete data;
construct and interpret the relevant information from the above-mentioned
tables.
• graphically present data;
construct a dot plot for continuous data;
graphically represent continuous data correctly using a histogram;
draw up a frequency polygon and cumulative frequency polygons;
draw up a relative frequency polygons as well as relative cumulative
frequency polygons;
4
Study unit 3
Study material
Chapter 3 in ESM
Individual activity
Note the steps involved in setting up a frequency distribution and how it is illustrated
in Table 3.3. Sturge’s rule gives an indication of the number of classes required.
Study Table 3.4 in order to understand the calculation of cumulative frequencies.
Study Tables 3.5 and 3.6 to determine how relative, percentage and relative
cumulative frequency tables are constructed.
Complete the relevant parts of Exercises 1, 2, 3 and 5 of Chapter 3 in ESM.
Study Table 3.8 in order to determine how frequency tables can be constructed for
discrete data.
Study Tables 3.9 and 3.10 for an illustration of how cumulative and relative
frequencies can be calculated.
Complete the relevant parts of Exercises 1, 2, 3, 4 and 5 of Chapter 3 in ESM.
Study Figure 3.1 and Figure 3.2 to determine how dot plots and histograms are
constructed.
Study Figure 3.4, 3.5 and 3.6 in order to understand how cumulative polygons,
relative polygons, and relative cumulative polygons are constructed.
Study Figure 3.7, 3.8 and 3.9 to determine how dot plots, bar charts and pie charts
are constructed.
5
Study unit 4
Study unit 4
DESCRIPTIVE MEASURES OF LOCATION
Study time
6 hours of lectures, 12 hours study and homework
Aim
In the previous study unit we summarised data by means of frequency tables and
graphical presentation. In this study section we’ll learn how to calculate different
values (like averages) from our data. Each of these values gives us an idea of the
position/size of our data. We will calculate these values for ungrouped data (raw data)
and for grouped data (already in frequency table) of both continuous and discrete data.
Study outcomes
Upon successful completion of the study unit, you should be able to:
• describe the measures of location;
• identify and describe different measures of location;
• explain how the different measures of location describes the data;
• calculate the different measures of location;
• write down the formulae used for the arithmetic mean for ungrouped data
1 n 1 k
x = ∑ xi and grouped data x = ∑ fi mi ;
n i =1 n i =1
• discuss the properties of the arithmetic mean and the mode;
• calculate the arithmetic mean and mode for both grouped and ungrouped data;
• calculate the median x and quantiles for grouped and ungrouped data;
• make use of a graphical method to find the median and quantiles for grouped
data;
• describe the properties of the median and quantiles;
• distinguish between symmetric and asymmetric (skewed) distributions;
• distinguish between distributions skewed to the left (negatively skewed) and
skewed to the right (positively skewed); and
• describe the behaviour of x , x and mo in relation to each other in the
6
Study unit 4
Study material
Chapter 4 in ESM
Individual activity
Complete the relevant parts of Exercise 1, 2, 3 ,4, 5 and 6 of Chapter 4 in ESM.
7
Study unit 5
Study unit 5
DESCRIPTIVE MEASURES OF SPREAD
Study time
6 hours of lectures, 12 hours study and homework
Aim
In this study unit we will calculate values that give us an idea of the data’s spread.
These values must be calculated for ungrouped and grouped data as well as for
discrete data and continuous data. Now that we have the knowledge of measures of
location and spread, we can focus on alternative graphical methods (other than those
discussed in Study Unit 3) to examine the distribution of the data before we carry on
with statistical computations.
Study outcomes
Upon successful completion of the study unit, you should be able to:
• explain how the most important measures of spread describe the data;
• calculate the measures for both grouped and ungrouped data;
• write down the formulae of these three different measures of spread;
• explain the advantages, disadvantages and differences between the measures;
• calculate the different measures, namely:
the range, r;
the interquartile range, qR ;
the quartile deviation, qd ;
for grouped and ungrouped data.
• write down the formulae for the standard deviation;
• explain the properties of the standard deviation;
• calculate the standard deviation, s, of grouped and ungrouped data; and
• calculate the standard deviation, s, by making use of the programmed facility
on your calculator;
• explain the purpose of calculating relative measures of spread; and
8
Study unit 5
Study material
Chapter 5 in ESM
Individual activity
Complete the relevant parts of Exercises 1, 2, 3, 4, 5 and 6 of Chapter 5 in ESM.
Work through the examples in Section 5.5 in ESM as if they are self-evaluation
exercise. Also calculate s with your calculator.
Work through Example 5.4 as if it is a self-evaluation exercise.
Work through Example 5.5 in ESM as if it is a self-evaluation exercise. Pay particular
attention to the way in which outliers are detected in this section and the way in
which the box-and-whisker plot is constructed (Figure 5.5). It is important to note that
the lines of the box-and-whisker plot stop with the observation just smaller (or larger)
than outliers.
Work through Example 5.6, where graphical methods are used to compare two data
sets.
9
Study unit 6
Study unit 6
FITTING CURVES
Study time
5 hours of lectures, 11 hours study and homework.
Aim
You have already learned how to summarise a one-variable dataset and calculate
values that give an indication of certain characteristics of the data (location and
spread).
Fitting of curves indicates how two-variable data can be summarised. We will try to
determine whether a relationship exists between two continuous variables in the
dataset examining the variables simultaneously. We want to know if a relationship
exists in order to use one variable to make predictions about the other.
The most basic sort of relationship is the linear relationship, which we will fit to the
data. Subsequently we will calculate values in order to determine the reliability of the
fit. We will also look at graphical methods to exam poor fits and learn to transform a
number of non-linear fits into linear fits.
Study outcomes
Upon successful completion of the study unit, you should be able to:
• determine the joint distribution of two discrete variables;
• determine the joint distribution of two continuous variables;
• draw and interpret a scatter plot of two variables;
• determine the linear relation between two continuous variables in a dataset;
• explain the characteristics of Pearson’s correlation coefficient;
• calculate and interpret Pearson’s correlation;
• calculate the equation of the least-squares straight line by hand for simple
problems and with a calculator for more difficult problems;
• calculate the predicted values by hand and calculator (or computer software);
• explain the meaning of the slope;
• determine whether the intercepts have any practical meaning;
10
Study unit 6
Study material
Chapter 6 in ESM
Individual activity
Work through Examples 6.1 and 6.2 as if they are self-evaluation exercises.
Complete the relevant parts of Exercises 1, 2, 3 and 4 of Chapter 6 in ESM.
Work through Example 6.4 in ESM as if it is a self-evaluation exercise.
11
Study unit 7
Study unit 7
TIME SERIES
Study time
3 hours of lectures, 7 hours study and homework
Aim
In this study unit we will look at the fitting of models in the bivariate situation, just as in
Study Unit 6. The difference here is that the x-axis will now be time. In other words,
we are now working with time-dependent data. Consequently we need to fit other types
of models. We will first look at the movement components found in this type of data
and then we will present a model which describes these components.
Study outcomes
Upon successful completion of the study unit, you should be able to:
• define a time-series;
• identify and discuss the different movement components which play a role in a
time-series;
• interpret the multiplicative model used to describe time-series data;
• discuss the movement components of a time series;
• calculate long-term movement by means of the least-squares method as well as
the method of moving averages;
• describe the advantages and disadvantages of the method of moving averages;
• make meaningful predictions for the time series.
Study material
Chapter 7 of ESM
12
Study unit 7
Individual activity
Study Figures 7.1 (a)-(d) to better understand the movement components.
Work through Example 7.1 and Table 7.2 as if it is a self-evaluation exercise. Study
Figure 7.4 carefully.
Complete Exercises 1 (a) and (b), 2(a), 3(a), (b) and (c) of Chapter 7 in ESM.
13
Study unit 8
Study unit 8
PROBABILITY
Study time
9 hours of lectures, 17 hours study and homework
Aim
In this study unit we are going to look at probability theory and probability distributions.
Probability theory is the link between descriptive statistics and statistical inference and
is consequently fundamental to the rest of this module.
The normal distribution will be looked at, and also the central limit theorem whichiss
important practical application of the normal distribution. This theorem is of utmost
importance for the application of inference.
Study outcomes
After the successful completion of this study unit, you should be able to
• define sample spaces and events;
• define probability and use this concept when calculating simple probability
expressions;
• explain the basic concepts of random variables ;
• apply the concept of a random variable;
• understand the relationship between a histogram and the density function of a
continuous random variable;
• explain the characteristics of the density function of a continuous random
variable;
• identify the different forms of density functions;
• describe the probability distribution of a discrete random variable;
• describe the probability distribution of a continuous random variable;
• reproduce the most important characteristics of normal distribution;
• explain the role of standard normal distribution;
• calculate probability expressions for normally distributed random variables with
the help of statistical tables;
• explain the difference between parameters and statistics;
14
Study unit 8
Study material
Chapter 8 of ESM
15
Study unit 9
Study unit 9
PRACTICAL CONSIDERATIONS REGARDING SAMPLE
SURVEYING
Study time
2 hours of lectures, 4 hours study and homework
Aim
In Chapter 2 of Elementary Statistical Methods, sampling methods were discussed. In
this study unit attention will be given to the practical aspects of sampling. Some of the
aspects that will be discussed include the planning procedure, determining the sample
size and the design of questionnaires.
Study outcomes
After completion of this study unit you should be able to:
• describe the elements of the planning procedure regarding sampling, especially
referring to the purpose of the survey and the description of the population;
• discuss methods of data collection;
• discuss factors which might have an influence on the sample size;
• understand the concept of pilot surveys;
• discuss the following aspects regarding the design of questionnaires:
the design of the general form,
the order of the questions,
types of questions,
content of questions,
confidentiality of questionnaires,
testing of questionnaires,
field workers,
how to handle non-responses.
16
Study unit 9
Study material
Chapter 12 of ESM
Individual activity
Do the relevant exercises in ESM Chapter 12
17
Key concepts
18
Key concepts
8. Steekproef- Die som van al die 8. Sample mean The sum of all the 8. Palogare ya Palogotlhe ya
gemiddelde waarnemings, gedeel observations divided sampole ditemogo tsotlhe e
deur die aantal by the number of arolwa ka palo ya
waarnemings. observations. ditemogo.
9. Steekproef- Die middelste 9. Sample median The middle 9. Palo ya gare ya Temogo e e fa gare
mediaan waarneming van ‘n observation of an sampole mo seteng ya
geordende datastel. organised set of data. tshedimosetso e e
rulagantsweng.
10. Steekproef- Die waarde in die 10. Sample mode The value that occurs 19. Palo-ipoeletso Boleng jo bo
modus datastel wat die most in the set of data. ya sampole tlhagelelang thata mo
meeste voorkom. seteng ya
tshedimosetso.
11. Kwartiele Die drie kwartiele 11. Quartiles The three quartiles 11. Kwataele Dikwataele tse tharo di
verdeel geordende divide organised data aroganya
data in vier gelyke into four equal parts. tshedimosetso e e
dele. rulagantsweng mo
dikarolong tse nne tse
di lekanang.
12. Variasie-wydte Die grootste 12. Range The biggest 12. Renje Temogo e kgolo go di
waarneming van ’n observation of a set of feta mo seteng ya
datastel minus die data minus the tshedimosetso ka go
kleinste waarneming. smallest observation. ntsha temogo e nnye
go di feta.
13. Stan- Die vierkantswortel van 13. Standard The square root of the 13. Palo-phapogo Sekwere-modi sa
daardafwyking die gemiddelde deviation medial quadratic tlhomamo palogare ya
kwadratiese afwyking deviation from the dikwateratiki tsa
vanaf die sample mean. diphapogo go tswa mo
steekproefgemiddelde. palogareng ya
sampole.
14. Variasie- Dit stel die 14. Coefficient of It presents the 14. Khoefišente ya E tlhagisa palo-
koëffisiënt standaardafwyking as variance standard deviation as a palo-phapaano phapogo e e
‘n persentasie van die percentage of the tlhomameng jaaka
steekproefgemiddelde sample mean. phesente ya palogare
voor. ya sampole.
19
Key concepts
15. Kleinste- Dit word verkry deur 15. Least squares It is obtained by 15. Mola wa E bonwa ka go
kwadrate-kromme die som van die curve minimalising the sum kwateratiki nnye nyenyefatsa palogotlhe
kwadrate van die of the squares of the ya disekwere tsa
foutterme te error terms. ditsela tsa diphoso.
minimaliseer.
16. Korrelasie- Dit is ‘n maatstaf van 16. Correlation It is a measure of 16. Khoefišente ya Ke selekanyo sa
koëffisiënt lineêre verwantskap coefficient linear relationship kgolagano botsalano jwa seka-
tussen twee between two variables. mola gareng ga
veranderlikes. difetogi tse pedi.
17. Bepaaldheids- Dit gee ‘n aanduiding 17. Coefficient of It gives an indication of 17. Khoefišente ya E neelana ka motlhala
koëffisiënt van hoe goed die determina-tion how well the least maikemisetso wa gore mothalo wa
kleinstekwadrate- squares curve fits in kwateratiki-nnye o
kromme by die with the observed data. tshwanela ke go
waargenome data pas. lakana go le kae le
tshedimosetso e e
lemogiwang.
18. Residue Die verskil tussen 18. Residuals The difference 18. Masalela Pharologano
werklike y-waarde en between real y-value magareng ga boleng
die geraamde y- and the calculated y- jwa y-nnete le boleng
waarde (gewoonlik value (usually obtained jo bo baleletsweng
verkry uit die from the smallest jwa-y (gantsi a bonwa
kleinstekwadrate- squares curve under go tswa mo
kromme onder e.o.a. ??? hypothesis). moobegong wa
hipotese). dikwatara tse
nnyennye mo tlase ga
??? kakanyo).
19. Tydreeks ’n Versameling 19. Time series A collection of 19. Tlhatlha-mano- Kokoanyo ya ditemogo
waarnemings van ‘n observations of a nako tsa tirego kgotsa
verskynsel of phenomenon or sefetogi se se ka
veranderlike wat in variable that can be kokoanngwang mo
chronologiese volgorde gathered in tatelanong ya nako.
ingesamel word. chronological order.
20. Multipli-katiewe Die produk van die vier 20. Multiplicative The product of the four 20. Motlolo wa Kuno ya dikarolo tse
model bewegingskomponente model components of katiso nne tsa tshutatshuto e
wat in ‘n tydreeks movement present in a e e leng teng mo
teenwoordig is. time series. tlhatlhamanong ya
nako.
20