Chapter 2 - Describing The Data
Chapter 2 - Describing The Data
containing the values of a variable (or a set of • The distribution condenses the raw data
ranges within which the data falls) ... into a more useful form...
and
d th
the corresponding
di ffrequencies
i with
ith which
hi h andd allows
ll ffor a quick
i k visual
i l iinterpretation
t t ti
each value occurs (or frequencies with which of the data.
data falls within each range).
3 of 35 4 of 35
Frequency Distribution:
F Di t ib ti R l ti F
Relative Frequency
Discrete Data
Relative Frequency: What proportion is in each category?
• Discrete data: p
possible values are countable
Number of Relative
Number of cars Frequency Frequency
Example: the cars observed Frequency 44
number of cars
0 44
0 44 .22 = .22
22
1 24 200
waiting for right 1 24 .12
2 18 22% of the
turn at a certain 2 18 .09 observations
3 16
intersection has 3 16 .08 report that there
4 20 is no car waiting
b
been observed.
b d 4 20 .10 for turn right.
5 22
6 26 5 22 .11
7 30 6 26 .13
Total 200 7 30 .15
Total 200 1.00
5 of 35 6 of 35
Frequency Distribution:
Continuous Data Grouping Data by Classes
Sort raw data in ascending order:
• Continuous Data: may take on any value in
12 13
12, 13, 17
17, 21
21, 24
24, 24
24, 26
26, 27
27, 27
27, 30
30, 32
32, 35
35, 37
37, 38
38, 41
41, 43
43, 44
44, 46
46, 53
53, 58
some interval.
• Find range:
g 58 - 12 = 46
Example:
E l A manufacturer
f t off insulation
i l ti randomly
d l selects
l t
20 winter days and records the daily high temperature • Select number of classes: 5 (usually between 5 and 20)
• C
Computet class
l width:
idth 10 (46/5 then round off)
24, 35, 17, 21, 24, 37, 26, 46, 58, 30, • Determine class boundaries:10, 20, 30, 40, 50
32, 13, 12, 38, 41, 43, 44, 27, 53, 27 • Compute class midpoints: 15, 25, 35, 45, 55
• Count observations & assign
g to classes.
(Temperature is a continuous variable because it could
be measured to any degree of precision desired).
7 of 35 8 of 35
Frequency Distribution Example Histograms
Data in ordered array:
12 13
12, 13, 17
17, 21
21, 24
24, 24
24, 26
26, 27
27, 27
27, 30
30, 32
32, 35
35, 37
37, 38
38, 41
41, 43
43, 44
44, 46
46, 53
53, 58 • The classes or intervals are shown on the
horizontal axis.
Frequency Distribution
4 user judgment
3 between
3 2 bars, since • The goal is to create a distribution that is
2
continuous neither too "jagged" nor too "blocky”
1 0 0 d t
data • Goal is to appropriately show the pattern of
0
5 15 25 36 45 55 More
variation in the data.
Class Midpoints
11 of 35 12 of 35
How Many Class Intervals? General Guidelines
• Many
y ((Narrow class intervals)) • Number of Data Points Number of Classes
3.5
quency
2
15
1.5 50 – 100 6 - 10
Freq
classes
l 1
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
ore
frequency varies across classes.
classes
Mo
Temperature
12
– Class widths can typically be reduced as the
• Few (Wide class intervals) 10
8
number of observations increases.
Frequency
6
• may compress variation too much 4
– Distributions with numerous observations are more
F
and
d yield
i ld a blocky
bl k didistribution
ib i 2
• The minimum class width is METHOD: Separate the sorted data series
Largest Value - Smallest Value into leading digits (the stem) and
W =
Number of Classes the trailing digits (the leaves).
15 of 35 16 of 35
Example: Example:
Data in ordered array: Data in ordered array:
12 13
12, 13, 17
17, 21
21, 24
24, 24
24, 26
26, 27
27, 27
27, 30
30, 32
32, 35
35, 37
37, 38
38, 41
41, 43
43, 44
44, 46
46, 53
53, 58 12 13
12, 13, 17
17, 21
21, 24
24, 24
24, 26
26, 27
27, 27
27, 30
30, 32
32, 35
35, 37
37, 38
38, 41
41, 43
43, 44
44, 46
46, 53
53, 58
• Here,
Here use the 10’s
10 s digit for the stem unit: • Completed Stem-and-leaf
Stem and leaf diagram:
Stem Leaf Stem Leaves
• 12 is shown as 1 2 3 7
1 2
2 1 4 4 6 7 8
3 0 2 5 7 8
• 35 is shown as 3 5
4 1 3 4 6
5 3 8
17 of 35 18 of 35
Categorical • B
Bar charts
h andd Pi
Pie charts
h are often
f
Data used for qualitative (category) data.
19 of 35 20 of 35
Pi Chart
Pie Ch E Example
l Bar Chart Example
Current Investment Portfolio
Investor's
Investor s Portfolio
Investment Amount Percentage Savings
Type (in thousands $)
15%
Stocks 46.5 42.27 Stocks
St k S i
Savings
Bonds 32.0 29.09 42%
CD CD
CD 15 5
15.5 14
14.09
09 14%
Savings 16.0 14.55 Bonds
Total 110 100
Stocks
Bonds Percentages
(Variables are Qualitative)
29% are rounded to 0 10 20 30 40 50
the nearest
percent
Amount in $1000's
21 of 35 22 of 35
Number of Frequency
90% 50
40% vehicles
ategory
y
80% 0 44
cumulative % invested
35%
40
1 24
sted in each ca
70%
30%
graph)
2 18
(line graph)
Frequency
60% 30
25% 3 16
g
(bar g
50%
20%
4 20 20
40%
5 22
15%
% inves
30% 6 26 10
10%
20% 7 30
0
5%
Total 200
10% 0 1 2 3 4 5 6 7
0% 0%
Number of vehicles turn right
Stocks Bonds Savings CD
23 of 35 24 of 35
Tabulating
Tab lating and Graphing Tabulating and Graphing
g
Multivariate Categorical Data Multivariate Categorical Data
( ti
(continued)
d)
25 of 35 26 of 35
Sid b Sid Ch
Side-by-Side Chartt E
Example
l Line Charts and
g
Scatter Diagrams
• Sales by quarter for three sales territories:
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Ea st
W e st
20.4
30.6
27.4
38.6
59
34.6
20.4
31.6
• Li
Line charts
h t show
h values
l off one variable
i bl
North 45.9 46.9 45 43.9 vs. time
60 – Ti
Time is
i traditionally
t diti ll shown
h on th
the h
horizontal
i t l
axis.
50
40
East • Scatter Diagrams show points for bivariate
30 West data
North – one variable is measured on the vertical axis
20
and the other variable is measured on the
10 horizontal axis.
0
1 t Qtr
1st Qt 2 d Qt
2nd Qtr 3 d Qt
3rd Qtr 4th Qtr
Qt
27 of 35 28 of 35
Line Chart Example Scatter Diagram Example
Inflation
Year Production Volume vs. Cost pper Dayy
Rate U S Inflation Rate
U.S. Volume Cost per
1985 3.56 6 per day day
1986 1.86
1987 3.65
%)
I nflation Rate (%
5 23 125 250
1988 4.14
4 26 140 200
1989 4.82
Co st p er Day
1990 5.40 29 146
3 150
1991 4 21
4.21
1992 3.01
33 160
2
1993 2.99 38 167 100
1994 2.56 1
1995 2.83 42 170 50
1996 2.95 0
50 188 0
1997 2.29 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002
1998 1.56 55 195 0 10 20 30 40 50 60 70
1999 2.21 Year
60 200
2000 3.36 Volume per Day
2001 2.85
2002 1 58
1.58
29 of 35 30 of 35
X X X X
31 of 35 32 of 35
Types off Relationships Chapter Summary
((continued))
• Data in raw form are usually not easy to use
• No Relationship for decision making -- Some type of
organization is needed:
Y Y
♦ Table
T bl ♦ Graph
G h
• Techniques
q reviewed in this chapter:
p
– Frequency Distributions and Histograms
– Bar
B Charts
Ch t and d Pi
Pie Ch
Charts
t
– Stem and Leaf Diagrams
X X – Line Charts and Scatter Diagrams.
33 of 35 34 of 35
Thank You
35 of 35