Business Research Methods and Statistics Using SPSS (Chapter 7 - Describing and Presenting Your Data)
Business Research Methods and Statistics Using SPSS (Chapter 7 - Describing and Presenting Your Data)
7
Describing and Presenting Your Data
‘A statistician can have his head in the oven, his feet in an icebox and say that on average
he feels fine’
‘Statisticians do all standard deviations’
‘Is it appropriate to use a pie chart in making a presentation at a baker’s convention?’
‘In what ways are measures of central tendency like valuable real estate – location!
location! location!’
‘First draw your curves then plot your data’
(Sources unknown)
Content list
Descriptive statistics – organizing the data
Measures of central tendency
The mode
The median
The mean
Measures of dispersal or variability
Range
Variance
Standard deviation
The quartiles and interquartile range
Using SPSS to calculate and display descriptive statistics
To obtain descriptive statistics
Tabulation by frequency distributions and cross-tabulation
To produce a simple frequency line graph by SPSS
Bar charts
Pie charts
Box and whisker plot
Stem and leaf display
Frequency histograms
The number of items or frequency of occurrence in each given value category or set
1
of values, i.e. the distribution of frequencies.
The average value of a set of values: this is the measure of the central tendency of
2
the data.
The spread of the values above and below the average: this is the measure of
3
dispersion or variability of the data.
The distribution of the values of a variable. What is its shape; is there a tendency to
4 bunch towards lower or higher values, i.e. skewness. Or is there an approximation to
a normal distribution?
This chapter will introduce the first three characteristics of a distribution of data noted
above, together with visual displays that offer concise ways of summarizing information.
The next chapter covers another major property of data – measures of skewness, which
tell us whether a distribution is symmetrical or asymmetrical. These properties of data
yield a relatively complete summary of the information that can be added to by pictorial
displays of charts, frequency distributions and graphs. They are the building blocks on
which more sophisticated calculations and comparisons can be based but no particular
measure is very meaningful taken on its own. In many cases, knowledge of only two of
these – central tendency and dispersion – is sufficient for research purposes and form the
basis of more advanced statistics.
The mode
The mode is the simplest measure of central tendency and is easily determined in small
data sets merely by inspection. In larger data sets, it can be determined by producing a
stem and leaf diagram or histogram. The mode is defined as the most frequently occurring
score in a group of scores. It is the typical result, a common or fashionable one (à la
mode), but unfortunately not a mathematically sophisticated one. In the following set of
data, we can identify the mode as the value 8, as it occurs more frequently than any of the
other score values.
The mode. The observation that occurs most frequently in a set of data.
The distribution in the above example would be described as unimodal, as there is only
one score which occurs with a greater frequency than any other. In some distributions no
score occurs with greater frequency than any other. For the set of observations 10, 11, 13,
16, 18, 19 there is no mode at all. In other distributions there may be two or more modes.
A distribution with two or more modes is said to be multimodal. Multimodal covers a
range of possibilities. For example in the following list of data:
the distribution is bimodal with both 6 and 8 considered as modes. It is customary in such
cases to cite the two modes, but then the concept of the mode as the most typical score no
longer applies. So, while the mode is easy to obtain, there may be more than one mode or
even no mode in a distribution. In a rectangular distribution where every score is the same,
every score shares the honour.
We cannot rely on the mode to accurately reflect the centre of a set of scores, since we
can have several modes and even in a unimodal distribution the most frequently occurring
score may not always occur somewhere near the centre of a distribution. As a result, the
mode has limited usefulness as a measure of central tendency. However, the mode is the
only measure of central tendency that can be used with qualitative variables such as
employment status, blood type, ethnic group, and political party affiliation. For variables
that are inherently discrete, such as family size, it is sometimes a far more meaningful
measure of central tendency than the mean or the median. Whoever heard of a family with
the arithmetically correct mean of 4.2 members? It makes more sense to say that the most
typical family size is 4 – the mode. Other than this, the mode has little to recommend it
except its ease of estimation.
The median
Median (Mdn) means ‘middle item’. The median is the point in a distribution below
which 50% of the scores fall. It is determined by placing the scores in rank order and
finding the middle score. The size of the measurements themselves does not affect the
median. This is an advantage when one or two extreme scores can distort an arithmetical
average or mean (see below). The procedure for determining the median is slightly
different, depending on whether N, the number of scores, is odd or even.
For example, if we have a series of nine scores, there will be four scores above the
median and four below. This is illustrated as follows:
16 6 11 24 17 4 19 9 20
Arranged in order of magnitude these scores become:
In our example, we had a set of odd numbers which made the calculation of the median
easy. Suppose, however, we had been faced with an even set of numbers. This time there
would not be a central value, but a pair of central values. No real difficulty is presented
here, for the median is to be found halfway between these two values.
If we put the following numbers in rank order and find the median score:
16 29 20 9 34 10 23 12 15 22
these numbers appear as follows:
The median. The middle observation after all data have been placed in rank order.
Because the median is not sensitive to extreme scores it may be considered the most
typical score in a distribution. However, using the median often severely limits any
statistical tests that can be used to analyse the data further, since the median is an ordinal
or ranked measure. For example, medians from separate groups cannot be added and
averaged. It is therefore not widely used in advanced descriptive and inferential statistical
procedures.
The mean
The most widely used and familiar measure of central tendency is the arithmetic mean –
the sum of all the scores divided by the number of scores. This is what most people think
of as the average. Or in simple mathematical terms, the mean (M) is simply the sum of all
the scores for that variable (∑X) divided by the number of scores (N) or:
The usual symbol for a sample mean is M although some texts use (or ‘X bar’). The
letter X identifies the variable that has been measured. If we are concerned with the
population mean, some texts designate this as μ, the Greek letter mu (pronounced mew).
In this text, as we are usually dealing with sample means, we shall be generally using M.
The mean. The sum of all the scores in a distribution divided by the number of
those scores.
The mean is responsive to every score in the distribution. Change any score and you
change the value of the mean. The mean may be thought of as the balance point in a
distribution. If we imagine a seesaw consisting of a fulcrum and a board, the scores are
like bricks spread along the board. The mean corresponds to the fulcrum when it is in
balance. Move one of the bricks to another position and the balance point will change
(Fig. 7.1).
The mean is the point in a distribution of scores about which the summed deviations of
every score from the mean are equal to zero. When the mean is subtracted from a score,
the difference is called a deviation score. Those scores above the mean will have positive
deviations from it while the scores below the mean have negative deviations from it. The
sum of the positive and negative deviations are always zero. This zero sum is the reason
why measures other than actual deviations from the mean have to be used to measure the
dispersal or spread of scores round the mean.
Deviation score. The difference between an individual score and the mean of the
distribution.
Since the mean is determined by the value of every score, it is the preferred measure of
central tendency. For example, a corporation contemplating buying a factory and taking
over its operation would be interested in the mean salary of the workers in the factory,
since the mean multiplied by the number of workers would indicate the total amount of
money required to pay all the workers. A sociologist studying the factory’s community
would probably be more interested in the median salary since the median indicates the pay
of the typical worker.
A major advantage of the mean is that it is amenable to arithmetic and algebraic
manipulations in ways that the other measures are not. Therefore, if further statistical
computations are to be performed, the mean is the measure of choice. This property
accounts for the appearance of the mean in the formulas for many important statistics.
There are two situations in which the mean is not the preferred measure of central
tendency:
Suppose that the following data were obtained for the number of minutes required to load
a company lorry with a day’s deliveries: 10.1, 10.3, 10.5, 10.6, 10.7, 10.9, 56.9. The mean
is 120/7 = 17.1; the median is 10.6. Which number best represents the time taken? Most of
us would agree that it is 10.6, the median. The mean is unduly affected by the lone
extreme score of 56.9. If a distribution is extremely asymmetrical the mean is strongly
affected by the extreme scores and, as a result, falls further away from what would be
considered the distribution's central area, or where most of the values are located and
becomes untypical, unrealistic and unrepresentative.
Because the median has the desirable property of being insensitive to extreme scores it
is unaffected. In the distribution of scores of 66, 70, 72, 76, 80 and 96, the median of the
distribution would remain exactly the same if the lowest score were 6 rather than 66, or
the highest score were 1996 rather than 96. The mean, on the other hand, would differ
widely with these other scores. Thus the median is preferred in skewed distributions where
there are extreme scores as it is not sensitive to the values of the scores above and below it
– only to the number of such scores.
Qualitative data
Suppose that the dependent variable is ethnic group membership and we collect the
following data: European, Malay, Chinese, Indian, Thai, Korean. There is no meaningful
way to represent these data by a mean; we could, however, compute the mode and say the
most typical member of the particular organization is Chinese.
Computation of all three measures of central tendency is relatively easy and SPSS
produces them without much effort. We will show you how below. Each of the measures
of central tendency imparts different information and the three values obtained from one
distribution can be very different as they represent different conceptions of the point
around which scores cluster. The question then becomes which one should be used in what
situation? The choice is based on:
Level of measurement
The first consideration is the type of scale represented by the data. With a nominal scale,
the mode is the only legitimate statistic to use. Recall that the mode is determined only by
frequency of occurrence by category and not by the order of the variables or their
numerical values. For example, suppose that a city population is divided into three groups
on the basis of type of residence: 15% have privately rented accommodation, 60% live in
their own accommodation, and 25% live in public housing. We might report that the
‘average’ person lives in their own accommodation. In this case, we are using the mode
because this is the most frequent category and the data are nominal.
If we were talking about the ‘average’ salary of employees, we would most likely use
the median. That is, we would place all the salaries in order (ordinal scale) and then
determine the middle value. The median would be preferred over the mean because the
salaries of a few highly paid CEOs would distort the mean to a disproportionate extent and
it would not be the most typical. The median is an ordinal statistic and is used when data
are in the form of an ordinal scale.
With an interval or ratio scale the mean is the recommended measure of central
tendency, although the median or mode may also be reported for these types of scales. For
example, if we were reporting the number of items produced by a factory on a daily basis
and seeking a measure of the average daily production over the month, this data would be
assumed to represent an interval scale, and the mean would generally be used.
Purpose
A second consideration in choosing a measure of central tendency is the purpose for which
the measure is being used. If we want the value of every single observation to contribute
to the average, then the mean is the appropriate measure to use. The median is preferred
when one does not want extreme scores at one end or the other to influence the average or
when one is concerned with ‘typical’ values rather than with the value of every single
case. If a city wanted to know the average taxable value of all the industrial property and
real estate there, then the mean would be used since every type of real estate would be
taken into consideration. However, if it wanted to know the cost of the average family
dwelling, the median would give a more accurate picture of the typical residence as it
would omit several luxury atypical dwellings.
The median is also of value in testing the quality of products. For example, to determine
the average life of a torch battery we could select 100 at random from a production run
and measure the length of time each can be used continuously before becoming exhausted
and then take the mean. However, this mean will not be a good reflection of how batteries
as a whole last as a few batteries with lives that grossly exceed the rest or a few dud ones
will distort the figures. If the time is noted when half the batteries ‘die’ this median may be
used as a measure of the average life.
If the purpose of the statistic is to provide a measure that can be used in further
statistical calculations and for inferential purposes, then the mean is the best measure. The
median and mode are essentially ‘terminal statistics’ as they are not used in more
advanced statistical calculations.
The choice between the mean and the median as a measure of central tendency depends
very much on the shape of the distribution. The median, as was shown earlier, is not
affected by ‘extreme’ values as it only takes into account the rank order of observations.
The mean, on the other hand, is affected by extremely large or small values as it
specifically takes the values of the observations into account, not their rank order.
Distributions with extreme values at one end are said to be skewed. A classic example is
income, since there are only a few very high incomes but many low ones. Suppose we
sample 10 individuals from a neighbourhood, and find their yearly incomes (in thousands
of dollars) to be:
25 25 25 25 40 40 40 50 50 1000
The median income for this sample is $40,000, and this value reflects the income of the
typical individual. The mean income for this sample however, is 25 + 25 + 25 + 25 + 40 +
40 + 40 + 50 + 50 + 1000 = 130 or $130,000. A politician who wants to demonstrate that
their neighbourhood has prospered might, quite honestly, use these data to claim that the
average (mean) income is $130,000. If, on the other hand, they wished to plead for
financial aid for the local school, they might say, with equal honesty, that the typical
(median) income is only $40,000. There is no single 'correct' way to find an 'average' in
this situation, but it is obviously important to know which measure of central tendency is
being used.
As you can see, the word 'average' can be used fairly loosely and in media reports and
political addresses the particular average may not be identified as a mean or a median or
even a mode. Measures of central tendency can be misleading and, in the wrong hands,
abused. Hopefully you are better informed now.
As you can see, the variability of scores or spread of scores around the mean appears to
be the most prominent candidate, and we need to know how to measure this variability.
This concept of variability provides another way of summarizing and comparing different
sets of data.
The notion of variability lies at the heart of the study of individual and group
differences. It is the variability of individuals, cases, conditions and events that form the
focus of research. We can actually derive a mean, median and mode for a set of scores
whether they have variability or not. On the other hand, if there is considerable variation,
our three measures of central tendency provide no indication of its extent. But they
provide us with reference points against which variability can be assessed.
Range
One method of considering variability is to calculate the range between the lowest and the
highest scores. This is not a very good method, however, since the range is considerably
influenced by extreme scores and in fact only takes into account two scores – those at both
extremes.
Variance
A better measure of variability should incorporate every score in the distribution rather
than just the two end scores as in the range. One might think that the variability could be
measured by the average difference between the various scores and the mean, M:
This measure is unworkable, however, because some scores are greater than the mean and
some are smaller, so that the numerator is a sum of both positive and negative terms. (In
fact, it turns out that the sum of the positive terms equals the sum of the negative terms, so
that the expression shown above always equals zero.) If you remember, this was the
advantage of the mean over other measures of central tendency, in that it was the ‘balance
point’.
The solution to this problem however is simple. We can square all the terms in the
numerator, making them all positive. The resulting measure of variability is called the
variance or V. It is the sum of the deviation of every score from the mean squared divided
by the total number of cases, or as a formula:
or
Variance is the average squared deviation from the mean. Don’t worry about this
formula as SPSS will calculate it for you and produce all the descriptive statistics you
need. You will be shown how to do this later in this chapter. These simple mathematical
explanations are provided just so you can understand what these various statistics are.
An example of the calculation of the variance is shown in Table 7.1. As the table shows,
the variance is obtained by subtracting the mean (M = 8) from each score, squaring each
result, adding all the squared terms, and dividing the resulting sum by the total number of
scores (N = 10), to yield a value of 4.4.
Because deviations from the mean are squared, the variance is expressed in units
different from the scores themselves. If our dependent variable were costs measured in
dollars, the variance would be expressed in square dollars! It is more convenient to have a
measure of variability which can be expressed in the same units as the original scores. To
accomplish this end, we take the square root of the variance, the standard deviation.
Table 7.1 Calculating variance
Standard deviation
It is often symbolized as σ when referring to a population and ‘SD’ when referring to a
sample, which in this book, and in most research, it usually is. The standard deviation
reflects the amount of spread that the scores exhibit around the mean. The standard
deviation is the square root of the variance. Thus:
In our example in Table 7.1, the SD is about 2.1, the square root of the variance which
we calculated as 4.4.
The standard deviation. The square root of the mean squared deviation from the
mean of the distribution.
Interpreting the SD
Generally, the larger the SD, the greater the dispersal of scores; the smaller the SD, the
smaller the spread of scores, i.e. SD increases in proportion to the spread of the scores
around M as the marker point. Measures of central tendency tell us nothing about the
standard deviation and vice versa. Like the mean, the standard deviation should be used
with caution with highly skewed data, since the squaring of an extreme score would carry
a disproportionate weight. It is therefore recommended where M is also appropriate.
Figure 7.2 shows two different standard deviations: one with a clustered appearance, the
other with scores well spread out, illustrating clearly the relationship of spread to standard
deviation.
So, in describing an array of data, researchers typically present two descriptive
statistics: the mean and the standard deviation. Although there are other measures of
central tendency and dispersion, these are the most useful for descriptive purposes.
If you wish to compute separate sets of descriptive statistics for a qualitative variable,
say for men and women separately, after step 3 above, place the variable, e.g. gender, into
the Factor List box. This is what we have done in our example (Fig. 7.3). This will
provide descriptives on age for men and women separately.
Figure 7.3 Explore dialogue box.
The top sub-table reveals the number of cases and whether there is any missing data.
Missing cases are the number of scores which have been disregarded by SPSS for the
purposes of the analysis. There are none in this example.
The important statistics lie in the much larger bottom descriptives table, namely the
mean, median, variance and standard deviation. For example, the mean female age was
21.78, the median was 18 and the standard deviation was 8.46 (rounded).
There are many other statistical values that have been calculated, such as the
interquartile range, 95% confidence intervals, skewness, variance, range,
maximum and minimum score, etc. In the next few chapters you will be introduced to
those you have not yet met. You would not report all the measures displayed but
reproduce those of interest in a more simplified form, omitting some of the clutter of
detail. Remember that these descriptive statistics are produced on the Explore menu.
As well as using Explore, you can also obtain a smorgasbord of descriptive statistics
from Descriptives. These include the mean, sum, standard deviation, range, standard
error of the mean, maximum and minimum score, and skewness. Try out the
descriptives menu in your own time. It is easy to use.
While SPSS reports this data to three decimal places, two decimal places are usually more
than enough for most social science and business data. Measurement in these fields does
not need to be as sensitively accurate as in the physical sciences, so three decimal places is
overkill and infers a precision not warranted.
Table 7.2 Example of descriptive statistics produced by the Explore procedure
Sometimes SPSS will report values with a confusing notation like 7.41E-03. This
means move the decimal place 3 steps to the left. So 7.41E-03 becomes a more familiar
.00741. In the same way a figure like 7.41E+02 becomes 741.0 since the + sign tells us to
move the decimal place 2 steps to the right.
While it is important to be able to demonstrate means, and standard deviations, etc. little
or no sense can be got out of any series of numbers until they have been set out in some
orderly and logical fashion (usually a table or chart like a histogram) that enables
comparisons to be made. Never present data to management or clients in a raw form. They
need to be grouped in some way and summarized, so that we can extract the underlying
pattern or profile, make comparisons and identify significant relationships between the
data. What are the figures, given half a chance, trying to tell us? Tabulation is that first
critical step in patterning the data.
Tabulation by frequency distributions and cross-tabulation
A frequency distribution table presents data in a concise and orderly form by recording
observations or measures in terms of how often each occurred. We have already produced
frequency tables in Chapter 3 when dealing with the initial screening of the data for
accuracy of input (Tables 3.2(b) and (c)).
A useful extension of the simple frequency table is the cross-tabulation table which
tallies the frequencies of two variables together. SPSS possesses the cross-tabulation
feature which we will demonstrate using Chapter 7 SPSS data file B. Access that file now
on the SPSS data file Web page of this book.
To obtain a cross-tabulation
Figure 7.5 Example of line graph (frequency polygon) of age from data set Chapter 7 SPSS B.
Figure 7.6 Define Simple Bar dialogue box.
Bar charts
A common method of presenting categorical data is the bar chart where the height or
length of each bar is proportional to the size of the corresponding number.
1 Using data file SPSS Chapter 7 B click on Graphs >>Bar … on the drop down menu.
The Bar Chart dialogue box provides for choice among a number of different bar
2
chart forms. Chosen Simple for this demonstration. Then Define.
The Define Simple Bar dialogue box emerges with a variety of options for the
3 display. We have chosen N of cases but there are other options for you to explore
(Fig. 7.6).
Transfer the required variable – in our example Main method of transport into the
4
Category Axis box.
5 Click OK and the output presents the Bar Chart as in Figure 7.7.
A vertical bar is erected over each category or class interval such that its height
corresponds to the number of occurrences or scores in the interval. The bars can be any
width, but they should not touch, since this emphazises the discrete, qualitative character
of the categories. Both axes should be labelled and a title provided.
Figures 7.8 (a) and (b) are examples of clustered bar charts of the same data in which
each category of transport is split by gender. Figure 7.8 (a) analyzes the gender split in
each category by N while the second displays the data as a percentage. They illustrate how
easy it is to pick up the main features such as the fact that no male cycles and that
percentages are very similar for other categories, though this is not apparent in terms of
numbers. This illustrates the fact that when displaying data, experiment with different
displays to obtain one which is suitable for your purpose.
Figure 7.7 Example of bar chart.
Figure 7.8 (a) Example of clustered bar chart (N of cases). (b) Example of clustered bar chart (by percentage).
Figure 7.9 Two-directional bar chart.
The two-directional bar chart has bars going in opposite directions to indicate positive
and negative movements from an assumed average, or norm. Figure 7.9 presents data for
annual profitability of five branches of a supermarket chain. This form of bar chart is
particularly useful in highlighting differences in movements of a variable between
different regions or countries or over different time periods.
The way information can be displayed on a bar chart is limited only by the ingenuity of
the person creating the display.
Pie charts
A pie chart can be used as an alternative to the bar chart to show the relative size,
contribution or importance of the components, as in Figure 7.10. It can be found under
graphs on the drop down menu. Perhaps this is the most easily visually interpreted graph,
merely a circle divided into sectors representing proportionate frequency or percentage
frequency of the class intervals/categories. The last stage of construction is labelling the
sections of the pie, placing percentages on the slices and providing an appropriate title.
Use Chart Editor for this. For example, to place percentages on the pie:
There are two major disadvantages with pie charts. Firstly, comparisons between sectors
is difficult as visual relations between sectors that are similar in size is hard without
percentages placed on the sectors. Secondly, negative quantities cannot be recorded. For
example, in splitting the pie chart into sectors representing the amount of profit each
department made in the year, you cannot show the loss made by one department.
Frequency histograms
A histogram is similar in appearance and construction to a bar graph, but it is used to
display the frequency of quantitative variables rather than qualitative variables. A bar or
rectangle is raised above each score interval on the horizontal axis. Successive bars are
placed touching each other to show the continuity of the scores in continuous data (unlike
bar graphs where there is separation). An empty space should also be left at any interval
where there is no data to record.
The vertical axis should be labelled f, or frequency, and the horizontal axis labelled to
show what is being measured (scores, weight in kg, employee age groups, reaction time in
seconds, sales per month and so on). As usual, a descriptive title, indicating what the
graph is showing, is always placed with the graph.
A histogram is shown in Figure 7.13 of the variable age from Chapter 7 data file SPSS
B. Note that the edges of the bars coincide with the limits of the class intervals in blocks
of five years, e.g. 17.5–22.5 with 20 as the midpoint. There are no cases that fall in the
range 42.5–47.5 years old. Histograms are also located in the drop down menu under
Graphs.
A histogram has one important characteristic which the bar chart does not possess. A
bar chart is in one dimension representing a single magnitude. The height or length of the
bar corresponds to the magnitude of the variable, the width of the bar is of no
consequence. A histogram, however, has two dimensions, namely, frequency (represented
by the height of the bar or rectangle) and width of the class (represented by the width of
the bar). It is the area of the bar which is of significance.
Figure 7.13 Example of histogram.
Editing charts
Charts can be edited in many ways in the output viewer to enhance them for presentation.
Among other ‘goodies’, you can insert titles, add 3D effects, colour fill, explode sectors of
a pie chart, add percentages to pie chart slices, etc. Double-click on the chart to bring up
the Chart Editor. Play around modifying your output using the various menus on the
chart editor.
Review questions
Qu. 7.1
(a) What is the mode in the following set of numbers?
3, 5, 5, 5, 7, 7, 9, 11, 11.
(b) Is the mode in the following set uni, bi or tri-model?
4, 4, 5, 5, 6, 7, 7, 7, 8, 8, 9, 9, 9, 12, 13, 13.
Check your answer on the chapter website.
Qu. 7.2
What is the median of the following set of numbers: 23, 16, 20, 14, 10, 20, 21, 15, 18?
Check your answer on the website.
Qu. 7.3
Explain in what circumstances the median is preferred to the mean.
Qu. 7.4
List the advantages and disadvantages of each of the mean, median and mode.
Check your answers to the following multiple choice items on the Chapter 7 Web page.
1 A figure showing each score and the number of times each score occurred is a:
(a) histogram
(b) frequency distribution
(c) frequency polygon
(d) frequency polygram
2 The frequency of a particular value plus the frequencies of all lower values is:
(a) the summated frequency
(b) the additive frequency
(c) the cumulative frequency
(d) the relative frequency
A display of raw data which combines the qualities of a frequency distribution and a
3
graphic display of the data is a:
(a) root and branch
(b) principal and secondary
(c) stem and leaf
(d) pre and post
In drawing a pie chart we have total costs of running a factory as $12,000,000. If
4 wages and salaries are $3,000,000 what proportion of the pie is that sector
representing this cost element?
(a) 30%
(b) 12%
(c) 25%
(d) 33%
5 You are told that 6 employees are needed to load the trailer. The figure 6 is the:
(a) percentage
(b) proportion
(c) frequency
(d) dependent variable
6 As the numbers of observations increase the shape of a frequency polygon:
(a) remains the same
(b) becomes smoother
(c) stays the same
(d) varies with the size of the distribution
If a set of data has several extreme scores, which measure of central tendency is most
7 appropriate?
(a) mode
(b) median
(c) mean
(d) variance
8 The mode is preferred when:
(a) there are few values
(b) the data is in rank order
(c) a typical value is required
(d) there is a skewed distribution
9 In a box plot the box represents:
(a) the data
(b) the quartile range
(c) the middle 50% of values
(d) the median
10 The usual measure of dispersal is:
(a) the variation
(b) the standard variance
(c) the standard difference
(d) the standard deviation
When you have categorical variables and count the frequency of the occurrence of
11
each category your measure of central tendency is:
(a) the mean
(b) the mode
(c) the median
(d) you would not need one
If the standard deviation for a set of scores was 0 (zero) what can you say about the
12
distribution?
(a) the mean is 0
(b) the standard deviation cannot be measured
(c) all the scores are the same
(d) the distribution is multi-modal
Now access the website for Chapter 7 and attempt the additional questions and
activities there.