BAP SMPR Unit 1
BAP SMPR Unit 1
text:
§ King,
B.M.
&
Minium,
E.W.
(2007).
Statistical
Reasoning
in
the
Behavioral
Sciences
(5th
Ed.).
Noida:
John
Wiley.
Additional
References:
§ Garrett,
H.E.
&
Woodworth,
R.S.
(1987).
Statistics
in
Psychology
and
Education.
Mumbai:
Vakils,
Feffer
&
Simons
Pvt.
Ltd.
§ Mangal,
S.K.
(2012).
Statistics
in
Psychology
and
Education
(2nd
Ed.).
New
Delhi:
Prentice
Hall
of
India
§ What
is
Psychological
Research?
Relevance
of
Statistics
in
Psychological
Research,
Descriptive
and
Inferential
Statistics
§ Scales
of
Measurement
(Nominal,
Ordinal,
Interval,
Ratio)
§ Graphical
Representation
of
Data
(Histogram,
Frequency
Polygon,
Cumulative
Percentage
Curve,
Bar
Diagram,
Pie
Chart)
What
is
Psychological
Research?
§ Defined
as
“the
scientific
study
of
behaviour,
mental
processes
and
experiences
of
individuals”.
§ About
answering
questions.
§ Whose?
Yours!
§ Does
stress
influence
performance?
§ Does
online
communication
affect
well-‐being?
§ Does
exposure
to
media
images
influence
body
image?
A
systematic,
empirical,
critical
investigation
that
is
structured
to
answer
questions
about
the
behaviour
and
experiences
of
individuals.
Research
Clearly
set
linear
path
and
successive
Non
linear
research
path
that
permits
and
Path
procedures
that
seem
to
follow
in
a
obligates
the
researcher
to
go
in
cyclical,
logical
sequence.
back
and
forth,
and
non-‐successive
sequences.
Sample
Usually
a
small
group
of
respondents.
Usually
a
large
number
of
cases
representing
a
population
of
interest.
Research
Questions
are
finalized
before
the
May
start
out
with
a
vague
or
poorly
Question
study
and
are
used
in
developing
defined
research
question
which
may
evolve
as
the
study
progresses
and
new
steps
and
guiding
the
study.
insights
are
gained
and
incorporated.
Basis
of
Quantitative
Research
Qualitative
Research
Difference
Research
Use
highly
structured
methods
Use
semi-‐structured
methods
such
as
in-‐
methods
such
as
questionnaires,
surveys
depth
interviews,
focus
groups
and
and
structured
observation.
participant
observation.
Purpose
of
Primarily
deductive;
used
to
test
Primarily
inductive;
used
to
formulate
theory
research
pre-‐specified
concepts,
or
hypotheses.
constructs,
and
hypotheses
that
make
up
a
theory.
Focus
Stress
is
on
objectivity.
Allows
subjectivity;
researchers
can
employ
personal
insights,
feelings,
and
human
perspectives
to
understand
social
life.
§ Determine
the
extent
of
relationships
among
variables:
Most
psychological
research
is
intended
to
examine
relationships
between
two
or
more
variables.
§ Using
statistics,
researchers
can
measure
the
direction
and
strength
of
the
relationship
between
variables
as
they
exist
naturally.
§ Make
inferences
based
upon
data:
By
using
what’s
known
as
inferential
statistics,
psychologists
can
analyze
and
make
inferences
from
the
data
to
reach
valid
conclusions
that
extend
beyond
the
immediate
data.
§ This
is
important
because
we
observe
only
a
sample
of
elements,
and
reason
from
the
particular
facts
or
cases
to
draw
general
conclusions
–
a
process
called
as
induction.
§ Thus,
statistics
help
the
researcher
to
answer
the
questions
that
initiated
the
research
by
determining
exactly
what
general
conclusions
are
justified
based
on
the
specific
results
that
were
obtained.
§ Statistics
consists
of
univariate
and
multivariate
procedures.
§ Psychologists
use
univariate
procedures
when
they
measure
only
one
variable.
§ They
use
multivariate
procedures
when
multiple
variables
are
used
to
ascertain
the
relationship
between
two
or
more
variables,
to
make
inferences
from
the
data,
to
extract
factors
(or
latent
variables)
etc.
§ A
personality
psychologist
may
use
statistics
to
study
individual
differences
in
personality
and
their
effects
on
behaviour.
§ A
family
counselor
may
use
statistics
to
describe
patient
behavior
and
the
effectiveness
of
a
treatment
program.
§ A
sports
psychologist
may
use
statistics
to
analyze
the
performance
of
athletes.
§ A
cross-‐cultural
psychologist
may
use
statistics
to
study
differences
among
people
from
different
cultures.
§ The
reason
it
is
important
to
study
statistics
can
be
described
by
the
words
of
Benjamin
Disraeli,
19th-‐century
British
statesman:
“There
are
lies,
damned
lies
and
statistics.”
§ He
meant
that
statistics
can
be
deceiving
–
and
so
can
interpreting
them.
§ The
purpose
of
descriptive
statistics
is
to
organize
and
to
summarize
observations
so
that
they
are
easier
to
comprehend.
§ Descriptive
statistics
is
important
because
if
we
simply
presented
our
raw
data
it
would
be
hard
to
visualize
or
understand
what
the
data
was
showing,
especially
if
there
was
a
lot
of
it.
§ Descriptive
statistics
therefore
enables
us
to
present
the
data
in
a
more
meaningful
way,
which
allows
simpler
interpretation
of
the
data.
§ Descriptive
statistics
makes
use
of
tabular,
graphical
and
numerical
techniques
to
present
the
data.
§ It
is
used
to
describe
data
in
terms
of:
§ Distribution
(frequency
tables,
bar
graphs,
pie
charts
etc.)
§ Center
(measures
of
central
tendency:
mean,
median
mode)
§ Spread
(measures
of
variability:
range,
standard
deviation,
variance)
Some
examples:
§ A
teacher
summarizes
the
general
performance
of
a
student
across
a
wide
range
of
course
experiences
using
the
Grade
Point
Average
(GPA).
§ A
sports
fan
ranks
a
team’s
players
according
to
their
batting
averages.
§ A
college
professor
administers
a
test
of
statistical
competence
and
summarizes
the
scores
to
interpret
the
competence
level
of
her
students.
§ An
organization
obtains
the
average
number
of
sales
made
by
its
telemarketing
executives
to
assess
their
performance.
§ The
purpose
of
inferential
statistics
is
to
draw
a
conclusion
(an
inference)
about
conditions
that
exist
in
a
population
(the
complete
set
of
observations)
from
the
study
of
a
sample
(a
subset)
drawn
from
the
population.
§ Population
includes
all
observations
(e.g.
students’
scores,
people’s
incomes,
etc.)
in
which
the
researcher
is
interested.
§ Sample
is
a
carefully
chosen
subset
of
the
population.
We
use
the
sample
to
infer
something
about
the
characteristics
of
the
population.
§ Typically,
the
research
process
begins
with
a
question
about
a
population
parameter.
However,
data
is
obtained
from
a
subset
of
the
population
and
a
sample
statistic
is
computed
from
the
data
to
make
an
inference
about
the
population
parameter.
§ Parameters
are
the
real
entities
of
interest,
and
the
corresponding
statistics
are
guesses
at
reality.
§ Thus,
we
want
to
infer
something
about
the
characteristics
of
the
population
(parameter)
from
what
we
know
about
the
characteristics
of
the
sample
(statistic).
Fig: The steps used in statistical inference
Some
examples:
§ A
pollster
asks
a
group
of
voters
how
they
intend
to
vote
in
the
upcoming
elections.
§ A
medical
researcher
examines
the
claim
that
large
doses
of
vitamin
C
can
help
prevent
the
common
cold.
§ A
social
psychologist
evaluates
the
influence
of
media
exposure
on
body
image
among
teenage
students.
§ A
researcher
tests
a
new
diet
drug
on
a
group
of
overweight
individuals.
§ A
personality
psychologist
studies
gender
differences
on
emo=onal
expressiveness.
Descriptive
Statistics
Inferential
Statistics
Aim
of
descriptive
statistics
is
to
Aim
of
inferential
statistics
to
draw
summarize
the
current
dataset.
conclusions
about
a
population
outside
of
the
obtained
dataset.
Descriptive
statistics
usually
Inferential
statistics
takes
a
sample
operate
within
a
specific
area
that
of
a
population
especially
if
the
contains
the
entire
target
population
is
too
big
to
conduct
population.
research,
or
when
we
don’t
have
access
to
the
entire
population.
It
does
not
allow
us
to
make
It
allows
us
to
make
conclusions
conclusions
beyond
the
data
we
beyond
the
immediate
data
we
have
have
analyzed.
analyzed.
Please note: Even when a data analysis draws its main conclusions using
inferential statistics, descriptive statistics are also generally presented.
SCALES
OF
MEASUREMENT
§ Measurement
is
a
process
of
systematically
assigning
values
(numbers
or
scores)
to
properties
of
people,
places,
things,
or
events.
§ The
way
the
values
are
assigned
determines
the
level/scale
of
measurement.
It
refers
to
the
amount
of
information
the
measurement
procedure
can
convey
about
the
actual
quantity
of
the
variable
present
and
about
the
differences
in
individuals
with
different
scores.
§ The
levels
of
measurement
differ
both
in
terms
of
the
meaning
of
the
numbers
used
in
the
measurement
system
and
in
the
types
of
statistical
procedures
that
can
be
applied
appropriately
to
data
measured
at
each
level.
§ Nominal
scale
is
the
simplest
of
the
four
levels
of
measurement.
§ It
involves
the
process
of
placing
observations
into
categories
that
differ
in
some
qualitative
aspect.
§ The
categories
used
for
classification
must
be
mutually
exclusive
(observations
cannot
fall
into
more
than
one
category)
and
exhaustive
(there
must
be
enough
categories
for
all
the
observations).
§ Variables
that
are
qualitative,
or
categorical
in
nature
are
usually
measured
on
a
nominal
scale,
because
we
merely
assign
category
labels.
§ For
example,
gender,
political
affiliation,
or
eye
color
are
qualitative
variables
that
can
be
measured
on
a
nominal
scale.
§ With
a
nominal
scale,
there
is
no
question
about
one
category
having
more
or
less
of
any
particular
quality;
all
categories
are
simply
different.
§ Although
the
categories
on
a
nominal
scale
are
not
quantitative
values,
they
are
occasionally
represented
by
numbers.
However,
the
numbers
are
only
arbitrary
and
do
not
designate
“more”
or
“less”
of
anything
(e.g.,
1
=
Red,
2
=
Blue,
3
=
Green,
4
=
Yellow).
§ For
example,
the
rooms
or
offices
in
a
building
may
be
identified
by
numbers.
But
the
room
numbers
are
simply
names
and
do
not
reflect
any
quantitative
information.
Room
109
is
not
necessarily
bigger
than
Room
100
and
certainly
not
9
points
bigger.
§ It
also
is
fairly
common
to
use
numerical
values
as
a
code
for
nominal
categories
when
data
are
entered
into
computer
programs.
§ For
example,
the
data
from
a
survey
may
code
males
with
a
0
and
females
with
a
1.
Again,
the
numerical
values
are
simply
names
and
do
not
represent
any
quantitative
difference.
§ At
the
next
level
of
complexity
is
the
ordinal
scale
(the
Latin
root
means
“order”).
In
this
type
of
measurement,
the
categories
must
still
be
mutually
exclusive
and
exhaustive,
but
they
also
indicate
the
order
of
magnitude
of
some
variable.
§ With
a
nominal
scale,
the
outcome
of
classification
is
a
set
of
unordered
categories.
With
the
ordinal
scale,
it
is
a
set
of
ordered
categories
or
ranks.
§ Thus,
ordinal
scale
constitutes
a
form
of
ordering
or
ranking
of
responses
along
some
underlying
dimension
that
expresses
“more”
or
“less”
of
something.
§ An
example
is:
instructor,
assistant
professor,
associate
professor,
and
professor.
§ We
may
use
numbers
for
the
ranks
but
it
is
not
necessary.
Among
persons
ranked
1,
2,
and
3,
the
first
person
has
a
greater
degree
of
merit
than
the
person
ranked
second,
and
the
second
person
has
greater
merit
than
the
third.
§ However,
the
interval
between
two
successive
ranks
is
indeterminate.
Consequently,
the
difference
between
any
two
consecutive
ranks
(e.g.
ranks
1
and
2)
may
not
be
the
same
as
that
between
another
pair
of
consecutive
ranks
(e.g.
ranks
2
and
3).
Thus,
ordinal
measurements
describe
order
but
not
the
relative
size
or
degree
of
difference
between
the
adjacent
steps
on
the
scale.
§ Also,
nothing
is
implied
about
the
absolute
level
of
merit.
All
observations
may
denote
excellence,
or
they
could
be
just
be
average.
§ The
next
major
level
of
complexity
is
the
interval
scale.
§ This
scale
has
all
the
properties
of
the
ordinal
scale,
but
with
the
further
refinement
that
a
given
interval
(distance)
between
scores
has
the
same
meaning
anywhere
on
the
scale.
§ Thus,
interval
scale
not
only
tells
us
about
the
ordering
of
observations
but
also
indicates
distance
between
them.
§ It
allows
us
to
know
how
many
units
greater
than,
or
less
than,
one
observation
is
from
another
on
the
measured
characteristic.
§ Examples
of
this
type
of
scale
are
degrees
of
temperature
on
the
Fahrenheit
or
Celsius
scales.
A
10° rise
in
a
reading
on
the
Celsius
scale
represents
the
same
change
in
heat
when
going
from
0°
to
10°
as
when
going
from
20°
to
30°.
§ However,
zero
point
on
interval
scales
is
an
arbitrary
reference
point;
the
value
of 0 is
assigned
to
a
particular
location
on
the
scale
simply
as
a
matter
of
convenience
or
reference.
Thus,
a
value
of
zero
does
not
indicate
a
total
absence
of
the
quality
being
measured.
In
the
Celsius
scale,
0°C
is
the
freezing
point
of
water
and
does
not
imply
an
absence
of
heat.
§ Also,
it
is
not
possible
to
speak
meaningfully
about
a
ratio
between
two
measurements
on
an
interval
scale.
Therefore,
we
cannot
assert
that
a
temperature
of
100°
Celsius
is
twice
as
hot
as
one
of
50° ;
or
that
a
rise
from
90° to
99° Celsius
is
a
10% increase.
§ A
ratio
scale
possesses
all
the
properties
of
an
interval
scale
and
in
addition
has
an
absolute
zero
point
in
which
there
is
total
absence
of
the
characteristic
being
measured.
§ It
is
therefore
possible
to
speak
meaningfully
about
a
ratio
between
two
measurements.
§ The
Kelvin
scale
has
an
absolute
zero,
the
point
at
which
a
substance
would
have
no
molecular
motion
and,
therefore,
no
heat.
Thus,
100°
is
twice
as
hot
as
50° on
the
Kelvin
scale.
§ Other
examples
of
ratio
scale
measurements
are
length,
weight,
and
measures
of
elapsed
time.
§ Not
only
is
the
difference
between
40
in.
and
41
in.
the
same
as
the
difference
between
80
in.
and
81
in.,
but
it
is
also
true
that
80
in.
is
twice
as
long
as
40
in.
A Ratio Scale
Each
scale
of
measurement
sa=sfies
one
or
more
of
the
following
proper=es
of
measurement.
§ Identity.
Each
value
on
the
measurement
scale
has
a
unique
meaning.
§ Magnitude.
Values
on
the
measurement
scale
have
an
ordered
rela=onship
to
one
another.
That
is,
some
values
are
larger
and
some
are
smaller.
§ Equal
intervals.
Scale
units
along
the
scale
are
equal
to
one
another.
§ Absolute
zero.
The
scale
has
a
true
zero
point,
below
which
no
values
exist.
§ In
a
nominal
scale,
numbers
are
assigned
to
categories
as
"names".
Which
number
is
assigned
to
which
category
is
completely
arbitrary.
Therefore,
the
number
only
gives
us
the
identity
of
the
category
assigned.
§ Ordinal
scales
have
the
property
of
magnitude
as
well
as
identity.
The
numbers
represent
a
quality
being
measured
(identity)
and
can
tell
us
whether
a
case
has
more
of
the
quality
measured
or
less
of
the
quality
measured
than
another
case
(magnitude).
The
distance
between
scale
points
is
not
equal.
§ The
interval
scale
of
measurement
has
the
properties
of
identity,
magnitude,
and
equal
intervals.
With
an
interval
scale,
you
know
not
only
whether
different
values
are
bigger
or
smaller,
you
also
know
how
much
bigger
or
smaller
they
are.
§ The
ratio
scale
of
measurement
satisfies
all
four
of
the
properties
of
measurement:
identity,
magnitude,
equal
intervals,
and
an
absolute
zero.
§ In
the
behavioral
sciences,
there
are
many
measuring
instruments
that
lack
equal
intervals
and
an
absolute
zero
point.
§ Consider,
for
example,
a
spelling
test.
Items
might
be
words
such
as
garden,
baseball,
and
rowboat.
A
score
of
zero
on
this
test
means
that
the
person
could
not
spell
the
simplest
word
on
the
list,
but
what
if
simpler
words
had
been
on
the
test,
such
as
cat,
run,
and
bat?
§ Our
spelling
test,
then,
does
not
have
an
absolute
zero
point
because
zero
on
the
spelling
test
does
not
indicate
a
total
absence
of
spelling
ability.
§ The
same
is
true
of
midterm
tests,
IQ
tests,
the
SAT,
and
almost
all
other
tests
of
mental
performance.
§ What
about
equal
intervals?
To
have
equal
intervals
on
our
spelling
test,
we
should
be
able
to
state
quantitatively
just
how
much
more
spelling
ability
is
needed
to
spell
garden
than
to
spell
cat.
§ However,
there
is
no
objective
way
to
determine
when
equal
numerical
intervals
on
a
mental
test
represent
equal
increments
in
performance.
§ Hence,
some
people
argued
that
calculating
certain
statistical
variables
(such
as
averages)
on
tests
of
mental
abilities
could
be
seriously
misleading.
§ Fortunately,
the
weight
of
the
evidence
suggests
that
in
most
situations,
making
statistical
conclusions
is
not
seriously
hampered
by
uncertainty
about
the
scale
of
measurement.
§ However,
there
are
several
areas
where
we
need
to
be
aware
of
scale
problems
to
avoid
taking
tempting
but
erroneous
positions.
§ For
example,
we
should
not
say
that
a
person
with
an
IQ
of
150
is
twice
as
bright
as
one
with
an
IQ
of
75.
§ Similarly
we
should
not
assume
that
the
difference
between
15
and
25
points
on
a
spelling
test
necessarily
represents
the
same
increment
in
spelling
ability
as
the
difference
between
a
score
of
30 and
40
points
on
the
same
test.
§ In
psychological
measurement,
this
problem
may
be
particularly
critical
when
a
test
does
not
have
enough
“top”
or
“bottom”
to
differentiate
adequately
among
the
group
measured.
§ For
example,
imagine
a
test
of
ability
that
has
a
maximum
possible
score
of
50
points
and
that
is
too
easy
for
the
group
measured.
§ For
two
persons
who
score
50
points,
the
score
for
one
may
indicate
the
maximum
level
of
achievement,
but
the
second
person
may
be
capable
of
a
much
higher
level
of
performance.
§ The
measuring
instrument
is
simply
incapable
of
showing
this
difference
because
it
does
not
include
items
of
greater
difficulty.
The
level
of
measurement
of
a
variable
tells
us
which
statistics
are
permissible
and
appropriate.
DESCRIPTIVE
STATISTICS
NOMINAL
ORDINAL
INTERVAL
&
RATIO
Frequency
tables
Frequency
tables
Frequency
tables
Mode
Percentiles
Mode
Mode
Median
Median
Mean
Range
Range
Variance
Standard
Deviation
Statistical
tests
are
divided
into
two
types:
parametric
and
nonparametric
tests.
Parametric
tests
are
more
powerful,
but
because
they
include
particular
mathematical
operations
on
the
values,
they
can
be
used
only
with
interval
or
ratio
data.
Ordinal
and
nominal
data
require
the
use
of
non-‐parametric
tests.
INFERENTIAL
STATISTICS
NOMINAL
ORDINAL
INTERVAL
AND
RATIO
Non-‐parametric
tests:
Non-‐parametric
tests:
Parametric
tests:
Chi-‐square
test
Rank-‐order
correlation
Pearson’s
correlation
Mann-‐Whitney
U
test
coefficient,
t-‐test
Kruskal-‐Wallis
test
ANOVA,
Regression,
Friedman’s
ANOVA
Factor
analysis
GRAPHICAL
REPRESENTATION
OF
DATA
§ Frequency
distributions
present
the
main
features
of
data
succinctly,
but
they
are
still
abstract
numerical
representations
and
require
effort
to
interpret.
§ Graphs
can
impart
the
same
information
and
speak
to
us
more
directly
by
pictorially/
visually
presenting
the
pertinent
features
of
the
data.
§ Their
ease
of
interpretation
makes
them
particularly
useful
when
we
want
to
present
data
to
the
general
public.
§ A
well
formatted
graph
helps
in
visually
illustrating
certain
characteristics
and
trends
in
a
set
of
data.
§ A
study
that
compared
the
use
of
frequency
tables
and
graphs
in
science
concluded
that
graphs
are
highly
preferred
because
“of
their
relative
readability,
ease
of
comprehension,
combinability,
and
overall
rhetorical
effectiveness”
(Smith
et
al.,
2002).
§ There
are
many
ways
to
graph
data.
We
will
be
discussing
the
five
most
common
graphs:
bar
graphs,
pie
charts,
histograms,
frequency
polygons,
and
cumulative
frequency
graphs.
§ Qualitative
variables
are
usually
represented
by
bar
graphs
and
pie
charts.
§ Quantitative
variables
are
represented
by
histograms,
frequency
polygons,
and
cumulative
frequency
graphs.
§ Graphed
frequency
distributions
generally
have
two
perpendicular
lines
called
axes:
X-‐axis
(horizontal
axis,
abscissa),
Y-‐axis
(ver=cal
axis,
ordinate).
§ The
measurement
scale
(set
of
X
values
or
categories)
is
listed
along
the
X-‐axis
(with
values
increasing
from
leM
to
right
for
quan=ta=ve
variables).
§ The
frequencies
(or
some
func=on
of
frequency)
are
listed
on
the
Y-‐axis
with
values
increasing
from
boNom
to
top.
§ As
a
general
rule,
the
point
where
the
two
axes
intersect
should
have
a
value
of
zero
for
both
the
scores
and
the
frequencies.
If
it
does
not
represent
a
0,
we
must
make
a
break
in
the
X-‐axis
to
indicate
that
a
por=on
of
the
scale
is
missing.
§ Graph
should
be
constructed
so
that
its
height
(Y-‐axis)
is
approximately
three-‐quarters
(3/4th)
of
its
width
(X-‐axis).
§ The
graph
should
have
an
informa=ve
=tle.
§ Both
the
axes
should
have
appropriate
labels.
§ A
histogram
is
the
most
commonly
used
graph
to
show
frequency
distributions.
§ It
is
a
chart
that
plots
the
frequency
distribution
of
a
numeric
variable
as
a
series
of
adjacent
bars/rectangles,
each
of
which
represents
the
scores
in
one
of
the
class
intervals
of
the
distribution.
§ The
two
vertical
boundaries
or
the
edges
of
the
bar
coincide
with
the
real
limits
of
the
particular
class
interval.
§ The
height
of
a
bar
represents
the
frequency
of
scores
for
that
class
interval.
We
can
use
either
frequencies
or
relative
frequencies.
Steps
in
construc-on
Step
1:
Construct
a
frequency
distribu=on.
Step
2:
Decide
on
a
suitable
scale
for
X-‐axis
by
iden=fying
and
adding
2
class
intervals
falling
immediately
outside
the
end
class
intervals.
The
number
of
class
intervals
thus
obtained
will
be
the
number
of
squares
required
for
the
width
of
the
graph.
Step
3:
Decide
on
a
suitable
scale
for
Y-‐axis
by
mul=plying
the
width
by
¾
or
.75
to
find
the
approximate
number
of
squares
for
the
graph’s
height.
Step
4:
Draw
bars
of
equal
width
for
each
class
interval
in
such
a
way
that
the
height
of
a
bar
corresponds
to
the
frequency
or
rela=ve
frequency
of
that
interval.
There
should
be
no
gaps.
The
edges
of
bar
represents
both
the
upper
real
limit
of
one
interval
and
the
lower
real
limit
of
the
next
higher
interval.
Step
5:
Iden=fy
the
class
intervals
by
using
either
real
limits
or
mid-‐points.
If
you
use
real
limits,
place
them
under
the
edge
of
each
bar.
If
you
use
mid-‐points,
place
them
under
the
middle
of
each
bar.
Step
6:
Label
the
axes
and
give
the
histogram
a
=tle.
Note:
The
midpoint
can
be
found
by
averaging
the
highest
and
the
lowest
scores
in
the
interval.
Results from 50 Students on the History Class Midterm
Examination
Results from 50 Students on the History Class Midterm
Examination
Histogram of grouped history midterm exam scores
§ A
frequency
polygon
is
exactly
like
a
histogram
except
that
points
are
drawn
rather
than
bars.
§ These
points
are
plotted
above
the
mid-‐point
of
each
class
interval
at
a
height
equal
to
the
frequency
or
relative
frequency
of
scores
in
that
interval.
§ The
points
are
then
connected
by
straight
lines.
§ To
ensure
that
the
graph
is
truly
a
polygon
(i.e.,
the
graph
is
a
closed
figure),
we
generally
bring
the
straight
line
down
to
the
mid-‐points
of
the
additional
first
and
last
class
intervals
with
zero
frequencies.
Steps
in
construc-on
Step
1:
Construct
a
frequency
distribu=on.
Step
2:
Decide
on
a
suitable
scale
for
X-‐axis
and
Y-‐axis.
Step
3:
Label
the
class
interval
mid-‐points
along
the
X-‐axis.
Step
4:
Place
a
dot
above
the
mid-‐point
of
each
class
interval
at
a
height
equal
to
the
frequency
or
rela=ve
frequency
of
the
scores
in
that
interval.
Step
5:
Connect
the
dots
with
straight
lines.
Step
6:
Label
the
axes
and
give
the
polygon
a
=tle.
Frequency polygon of grouped history midterm exam scores
§ Ordinarily,
we
bring
the
polygon
down
to
the
horizontal
axis
at
both
ends.
§ To
do
so,
iden=fy
the
two
class
intervals
falling
immediately
outside
those
end
class
intervals
containing
scores.
The
midpoints
of
these
intervals,
ploNed
at
zero
frequency,
are
then
connected
to
the
graph.
§ It
was
done
for
the
interval
57–59
because
scores
in
this
interval
were
possible,
though
not
obtained.
§ However,
what
do
we
do
when
scores
in
the
next
adjacent
class
interval
are
not
possible?
§ The
best
thing
to
do
in
a
case
like
this
is
to
leave
the
dot
“dangling.”
§ For
the
interval
96–98
the
graph
is
leM
“dangling”
because
scores
greater
than
100
were
not
possible
on
the
history
exam
and
the
next
adjacent
interval
is
99–101.
§ Bringing
the
polygon
down
to
0
at
the
midpoint
of
the
interval
99-‐101
may
mislead
someone
looking
at
the
graph
to
think
that
scores
in
that
interval
were
possible.
§ Both
the
histogram
and
the
polygon
are
used
for
graphing
quan=ta=ve
data
on
an
interval
or
ra=o
scale.
§ You
can
see
the
similari=es
between
the
two
graphs
by
superimposing
a
histogram
and
polygon
of
the
same
set
of
data,
as
shown
in
the
next
slide.
§ Nevertheless,
there
are
occasions
when
one
may
be
preferred
over
the
other.
Frequency polygon of grouped history exam scores
superimposed on a histogram of the same scores
§ A
histogram
is
oMen
used
when
graphing
an
ungrouped
frequency
distribu=on
of
a
discrete
variable
(or
data
treated
as
a
discrete
variable).
§ The
general
public
seems
to
find
a
histogram
a
liNle
easier
to
understand
than
a
polygon,
and
hence
it
may
be
a
good
choice
for
communica=ng
with
them.
§ A
histogram
also
has
some
merit
when
displaying
rela=ve
frequency.
The
total
area
in
a
histogram
represents
100%
of
the
scores,
and
thus
the
area
in
the
bars
of
a
histogram
is
directly
representa=ve
of
rela=ve
frequency.
That
is,
the
area
in
any
rectangle
is
the
same
frac:on
of
the
total
area
of
the
histogram
as
the
frequency
of
that
class
interval
is
of
the
total
number
of
cases
in
the
distribu:on.
§ However,
representing
frequencies
by
bars
suggests
that
the
scores
are
evenly
distributed
within
each
class
interval
and
that
the
borders
of
the
intervals
are
points
of
decided
change.
The
horizontal
top
of
each
rectangular
bar
is
responsive
only
to
what
occurs
in
one
class
interval.
Therefore
it
does
not
represent
a
trend
of
increasing
or
decreasing
frequency
accurately.
§ Also,
when
comparing
two
or
more
distributions,
horizontal
lines
in
the
histogram
will
often
coincide
creating
considerable
confusion.
Therefore,
it
is
not
used
for
comparisons.
§ A
frequency
polygon
is
often
preferred
for
grouped
frequency
distribution
of
a
continuous
variable
because
it
shows
the
gradual
change
over
a
wide
range
of
scores
and
suggests
continuity
of
the
variable.
§ The
direction
of
the
straight
line
in
a
frequency
polygon
is
determined
by
the
frequencies
in
two
adjacent
class
intervals.
Therefore,
if
a
definite
trend
of
increasing
or
decreasing
frequencies
exist
over
a
span
of
several
consecutive
class
intervals,
it
will
represent
this
trend
directly.
§ Frequency
polygons
are
particularly
helpful
when
comparing
two
or
more
distributions.
When
distributions
are
based
on
different
number
of
cases,
we
can
equalize
that
difference
by
using
relative
frequencies
rather
than
raw
frequencies.
OkCupid 20 6.8
Hinge 8 2.7
Grindr 6 2.0
Example of a bar diagram for qualitative data with sub-
categories
Frequency distribution of preference for a season among
a sample of 100 students
Seasons f
Winter 72
Autumn 15
Spring 10
Summer 3
n = 100
§ Pie
chart
is
a
circular
graph
whose
pieces
add
up
to
100%.
§ Like
the
bar
diagram,
it
is
useful
for
presenting
qualitative
data.
§ However,
unlike
the
bar
diagram,
in
which
results
can
be
expressed
as
either
raw
frequencies
or
relative
frequencies,
pie
charts
always
use
relative
frequencies.
§ The
area
in
any
piece
of
the
pie
is
the
same
fraction
of
the
pie
as
the
frequency
of
that
category
is
of
the
total
number
of
cases
in
the
distribution.
§ In
some
instances,
a
piece
of
the
pie
chart
is
exploded
(moved
slightly
outward)
to
indicate
that
section
of
the
pie
is
the
most
noteworthy.
Frequency and relative frequency distribution of dating apps
used by the survey respondents
§ The
curve
shows
a
small
rise
in
intervals
with
rela=vely
few
scores
and
a
sharp
rise
in
intervals
with
many
scores.
§ Many
distribu=ons
have
most
of
the
cases
in
the
middle
por=on
of
the
distribu=on.
This
results
in
a
cumula=ve
percentage
curve
with
an
S-‐shaped
figure
called
an
ogive
curve.
§ When
the
curve
is
carefully
drawn
and
the
scale
divisions
are
precisely
marked,
we
can
determine
percen=les
and
percen=le
ranks
from
the
cum%
curve.
(Connec=ng
the
dots
on
the
cumula=ve
curve
with
straight
lines
is
the
graphic
equivalent
of
assuming,
as
we
did
when
compu=ng,
that
scores
are
evenly
spread
throughout
the
interval).
Apparent limits Real limits f cum f cum %
96-98 95.5-98.5 1 50 100
93-95 92.5-95.5 0 49 98
90-92 89.5-92.5 2 49 98
87-89 86.5-89.5 7 47 94
84-86 83.5-86.5 10 40 80
81-83 80.5-83.5 6 30 60
78-80 77.5-80.5 8 24 48
75-77 74.5-77.5 4 16 32
72-74 71.5-74.5 3 12 24
69-71 68.5-71.5 4 9 18
66-68 65.5-68.5 3 5 10
63-65 62.5-65.5 0 2 4
60-62 59.5-62.5 2 2 4
N=50
Cumulative percentage curve of the grouped history exam scores
X f cum f cum %
195-199 1 50 100
190-194 2 49 98
185-189 4 47 94
180-184 5 43 86
175-179 8 38 76
170-174 10 30 60
165-169 6 20 40
160-164 4 14 28
155-159 4 10 20
150-154 2 6 12
145-149 3 4 8
140-144 1 1 2
N=50
§ Grouping
-‐
There
is
no
such
thing
as
the
graph
of
a
given
set
of
data.
The
same
set
of
raw
scores
may
be
grouped
in
different
ways,
and
the
grouping
will
affect
the
graph
of
the
distribution.
§ For
example,
the
two
graphs
(a)
and
(b)
show
the
results
of
the
history
exam
scores,
but
grouped
with
interval
widths
of
2
and
4,
respectively.
§ Compare
the
appearance
of
these
two
graphs
with
each
other
and
with
the
figure,
where
i=3.
When i =2
When i =4
When i =3
§ Relative
scale
–
The
decision
about
relative
scale
is
arbitrary,
and
the
resulting
graph
can
be
squat
or
slender
depending
on
the
choice.
§ The
large
difference
in
appearance
created
by
differences
in
scale
is
the
reason
for
the
convention
that
the
height
of
the
figure
should
be
about
three-‐
quarters
of
the
width.
§ Scale
used
for
the
Y-‐axis
–
Even
if
the
convention
regarding
height
and
width
is
followed,
the
same
data
can
appear
very
different
when
graphed,
depending
on
the
scale
of
measurement
used
for
frequency.
§ You
will
sometimes
see
graphs
with
a
break
in
the
vertical
axis.
Never
do
this!
Frequency
on
the
vertical
axis
should
always
be
continuous
from
zero.
When
we
put
a
break
in
the
axis,
we
lose
the
proportional
relationship
among
class
interval
frequencies.
Certain
shapes
of
frequency
distributions
occur
with
enough
regularity
in
statistical
work
that
they
have
names.
The
names
effectively
summarize
the
general
characteristics
of
the
distribution.
§ A
graph
with
three
or
more
humps,
each
with
the
same
maximum
frequency,
is
multimodal.
§ Technically,
a
distribution
is
bimodal
or
multimodal
only
if
its
humps
have
the
same
frequency.
Nevertheless,
distributions
with
pronounced
but
slightly
unequal
humps
are
commonly
described
as
bimodal
or
multimodal.
§ The
figure
below
shows
a
bell-‐shaped
distribution.
A
specific
type
of
bell-‐shaped
distribution,
called
the
normal
curve,
is
of
great
importance
in
statistical
inference.
§ Kurtosis
refers
to
the
degree
of
peakedness
of
a
graphed
distribution.
§ The
normal
distribution
is
mesokurtic;
meso-‐
means
intermediate.
§ A
distribution
flatter
than
the
normal
curve
is
called
as
platykurtic.
It
is
called
leptokurtic
if
it
is
more
peaked
than
the
normal
curve.