0% found this document useful (0 votes)
8 views

PSY123 Lecture 10-1

Uploaded by

snazo.dilo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

PSY123 Lecture 10-1

Uploaded by

snazo.dilo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Recap: Lecture9

PSY123-Introduction to
research methods and
statistics
• Chapter 2(pgs. 26-38), Reader (pgs. 21-27)
• Step 4: data analysis – analysing quantitative
data (introduction to statistics)
• Coding – assigning numerical values to
variables or responses
• Cleaning data set
• Descriptive statistics vs inferential statistics
• Summarising data: frequency distributions –
i.e., Ungrouped frequency distributions or
grouped frequency distributions
• Stem and leaf
Lecture 10
PSY123-Introduction to research methods and
statistics
Reader (pgs. 36-39)
• Graphic representation of frequency distributions
Reader (pgs. 40-43), Chapter 2 (pgs. 39-40)
• Measures of central tendency for grouped and ungrouped frequency
distributions
Reader (pgs. 43-46), Chapter 2 (pgs. 40)
• Measures of variability
Graphic presentation of data

• When looking at the stem and leaf display, it can also be viewed as
a pictorial or graphic presentation of scores.
• However, when we think of graphs, we often think of traditional
ones such as bar charts, histograms and the frequency polygon
• Graphic presentation is another way of presenting data and
information. Usually, graphs (as discussed above) are used to
present frequency distributions.
• Most graphs use two lines that are perpendicular (placed at right
angles) to one another
• The horizontal line is called the x-axis and the vertical line is called
the y-axis
• To summarise frequencies (the occurrence of scores), the score is
listed on the x-axis, while the frequencies are listed on the y-axis.
Bar chart vs Histogram

• Two of the traditional graphs that we may be familiar with are the bar
chart and the histogram
• So how different or alike are these graphs?
Gender example

Bar chart
• The bar chart is normally used for variables
with discrete or distinct categories (such as
gender or seasons)
• It is a pictorial representation of data that uses
bars to compare different categories of data.
• The verticals bars on the chart do not touch
each other, indicating discontinuity.
• Categories are listed on the x-axis and the
frequencies on
• The height of each bar corresponds to the
frequency for each category
• The gaps between bars are used to indicate
discrete/distinct categories
Gender example

Histogram
• The histogram is normally used for continuous,
numerical data (ranging from low to high) or
data that is grouped in different classes.
• It represents the frequency distribution of
continuous variables.
• The verticals bars on this chart do touch each
other.
• The x-axis represents groups of scores
organized into classes, or numbers which are
categorised together to represent ranges of
data (in the example, we note that the
midpoint of each class is listed on the x-axis and
not the various classes)
• The height of each bar corresponds to the
frequency for each category
Frequency polygon
• An alternative to the histogram to show frequencies is to use a frequency
polygon
What is a frequency polygon?
• It, like the bar and histogram, a graphical form of representation of data. It
is used to depict the shape of the data and to depict trends.
• It is also known as a line graph
• It is drawn by calculating and plotting the frequencies of the different data
values and then connecting the plotted dots or midpoints with a straight
line
• The midpoint or class mark can be calculated using the following formula:
1st add the lower-class limit and the upper-class limit, get the total and
then divide by 2
…continued
• In this polygon the midpoint of
each class is listed on the x-axis.
• Above each midpoint a solid dot is
placed corresponding to the
frequency of that class. The dots are
then joined together with the line.
• Also note that two additional
midpoints were added, one below
the lowest class (16) and one above
the highest class (49). This is done to
anchor the lines of the frequency
polygon to the x-axis.
Shapes of distribution
• We know now that graphs are a way of summarising and presenting data visually.
• However, they also play an important role in describing the shape of the distribution of scores
• The shape of a distribution plays a crucial role in the selection of appropriate statistical
techniques.
• The 4 shapes below are the most common shapes of a distribution of scores
…continued
• The figure represented shows a distribution
in statistics that is often desired and thus
known as a normal distribution
• In this distribution the majority of scores
are clustered in the centre of the
distribution, and then trails off towards the
upper and lower extremes
• The normal distribution is bell shaped
• It is also called a symmetrical distribution
since the left and right halves of the
distribution are mirror images of each
other.
…continued

• The top two shapes represent what


is known as asymmetrical
distributions
• In these distributions there are
either a cluster of extremely high
scores (negatively skewed
distribution) or a cluster of
extremely low scores (positively
skewed distribution).
• The peak of a distribution indicates Peak
where the highest frequency of
scores is located.
Positively skewed
• In the gym attendance example, the
peak is at the lower end of the
graph (thus most of the scores are
clustered at the lower end), while
the tail is at the higher end. Thus,
making a positively skewed
distribution
…continued

• If a distribution has only one peak


it is referred to as unimodal

• Distributions with two peaks (i.e.,


two clusters of scores with the
highest frequencies) are referred
to as bimodal
Numerical summary measures
• In addition to tables and graphs we could also use some numerical indicators to describe the
essential characteristics of our data.
These indices are divided into:
• measures of central tendency: a single value that in some way represents all the values in a
distribution, and
• measures of variability: a single value that describes the spread (i.e., variability) of scores in a
distribution.
Measures of central tendency
• Measures of central tendency provide a researcher with a single
numerical value/score that represents all the values of a particular
variable in the dataset.
• These indicate what is typical or average about the dataset.
• There are three measures of central tendency, namely the mode,
median and mean
Measures of central
tendency: Mode
• The mode refers to the score
that occurs most frequently
in a set of scores, i.e., the
number that occurs the
highest number of times.
• In the 2nd example: the score
of 3 and 10 both occur four
times and both of them
would be considered as the
mode. This would be a
bimodal distribution
Measures of central tendency: Median
• The median is the middlemost score when the scores are
ordered from lowest to highest. The Oxford Dictionary describes
a median as (that) ‘situated in the middle’.
• In example: We have a student who received the following
grades on five tests during her first-year psychology course: 34%,
88%, 34%, 50% and 78%.
What would the median be?
• To find that, 1st we order the scores from the lowest to the
highest:
34%, 34%, 50%, 67% , 78%
• We find that the median is 50% because it falls right in the
middle of the scores, with two scores on each side of it.
What happens if we have an even number of values?
• Where there is an even number of values, the median is found
by adding the two middle scores and dividing them by two
Measures of central tendency: Mean
• The mean is the most frequently used of all the measures of central
tendency and forms the basis of many inferential statistical
techniques.
• It is known as the average score
• It is produced by adding all the scores together and dividing by the
number of observations (N).
…continued
• The formula below represents the means for calculating the mean as
described in previous slide:

• In statistics, the symbol X with the bar on top 𝑋ത is used to symbolize


the mean.
• The symbol Σ (sigma summation) stands for “sum of” while X is used
to indicate the scores.
• Thus, ΣX (sum of) refers to the sum of all the scores.
…continued
• If we had the following scores:
5,3, 6,7,7,9,6
The mean would be calculated as follows (using the formula):

1st add all the scores, get your total (=43). Then divide by the total
number of observations ( in this case 7)
…continued
• It is possible to calculate the mode, median and mean for grouped
data – by using special formulae that produce only approximate
statistics since they (the stats or results produces) are not based on
actual scores
• However, it is advised to avoid calculating measures of central
tendency for grouped frequency distributions.
Skewness – comparing the measures of
central tendency
• Skewness refers to the tendency for
scores to cluster at either side of the
distribution.
• In a normal distribution (C) the mean,
mode and median are exactly the
same.
• In a positively skewed distribution (A),
most scores occur at the lower end of
the distribution below the mean with
the median and mode less than the
mean.
• In a negatively skewed distribution
(B), most scores appear above the
mean and the mode and median are
also greater than the mean.
Measures of variability
• Although measures of central tendency are extremely useful, they only
provide us with a partial description of our data.
• We also need to consider what measures contribute to a mean, mode or
median.
• For example, the mean of 30 plus 30 is 30; however, the mean of 60 plus 0
is also 30. Thus, to fully understand a data set, we also need to consider
measures of variability.
• A measure of variability provides an indication of how diverse or variable
the spread of scores is.
• When scores are spread out, the variability should be high. While, when
scores are clustered together, the variability should be low.
…continued
• The most common measures of variability are the range, the
variance, & the standard deviation.
• These are the most important descriptive statistics, as they form the
basis for most advanced statistical procedures
• The range indicates the distance between the highest and lowest
scores, whilst the other measures of variability relates to how far
scores vary from a typical score (i.e., mean).
…continued
So, what is the range?
• The range is defined as the highest
score minus the lowest score.
• In our gym attendance example,
the highest score is 47 and the
lowest is 19
• Therefore, the range is:

• The range depends on only two


scores in the distribution, it is thus
considered a very crude/simple
measure of variability and is not
used often.
…continued
• Variance on the other hand uses all the scores in the distribution
• It is a measure of the average deviation from the mean of all the
scores in the distribution.
• The symbol X − 𝑋ത is used to denote/show deviation from the mean

and it literally means, score (X) minus mean (𝑋).
• In an example where our scores are 2, 4, and 6, the mean would be
4.
• Therefore, the deviation these scores would be:
…continued
• The variance represents the average of these deviations, basically the
sum of the deviations divided by the number of observations (N).
• If, however, we had to sum the deviations, we would find that they
equal 0
• Dealing with this would need us to square all the deviation scores and
use these squared deviations in determining the variance
• Therefore, the variance can be defined (computationally) as:
…continued
• Note that N-1 was used instead of N, as it is shown that for small
samples, using N-1 leads to greater accuracy
• Statistically, the previous computational definition can be written as:
…continued
• Using the gym attendance example,
we see the variance calculated for a
select number of scores.
• We see, for instance that the score
19, minus the mean of 25.05 equals
the deviation score of -6.05. The
deviation score squared is equal to
36.60.
• The same process is followed for all
the scores.
• Once the squared deviation for all
the scores are produced, they are
summed and for this example the
sum of the squared deviation scores
is = to 2059.90
• The variance then is:
Standard deviation
• One of the difficulties with variance
is that it is not based on the
original deviations but rather on
the squared deviations.
• Therefore, it is no longer in the
original unit of measurement (in
our example the original unit of
measurement was number of
times) and as such has no direct
interpretive meaning.
• Although it can be used to
compare two distributions it has no
significance when it is considered
on its own.
…continued
So how can we fix this problem? • Continuing with the gym attendance data: we
simply placed a square root sign above the
• One solution would be to convert the previous formula for variance
variance back into the original units of
measurement and this is achieved by taking
the square root of the variance.
• The square root of the variance is called the
standard deviation
• It tells us, on average, how far each
value/score lies from the mean. i.e., how • In this example, it can be said (from the
spread out from the center of the distribution results of the standard deviation) that on
our data is on average. average the various scores deviate 7.27 from
• A high standard deviation means that values the mean
are generally far from the mean, while a low • The symbol S is used for standard deviation
standard deviation indicates that values are
clustered close to the mean.
Homework
1. True or false? The vertical line is called the x-axis and the horizontal line is called the y-axis.
2. __________provide a _______ index that describes the variability or spread of scores.
3. The graph is an example of a:
(A) Frequency polygon
(B) Bar chart
(C) Line graph
(D)Histogram

4. A group of seven people have taken a quiz and their scores are 8, 9, 10, 5, 8, 9, and 10. What is the standard deviation?
5. The _______ is normally used for _______, ________or data that is grouped in different classes.
6. A distribution is unimodal when:
(A)it has only one major peak
(B)when it is symmetrical
(C)the mean, mode and the median are equal
(D)both a and b
7. You are given the following scores:3, 5, 7, 12, 13, 14, 21, 23, 23, 23, 23, 29, 40, 56. What is the median of the scores?
8. True or False? a high standard deviation indicates that values are clustered very close to the mean.

You might also like