0% found this document useful (0 votes)
13 views

Chapter 2

This document discusses organizing and visualizing data. It covers constructing tables and charts for both categorical and numerical data. For categorical data, it describes summary tables, contingency tables, bar charts, and pie charts. For numerical data, it discusses ordered arrays, frequency distributions, relative frequency distributions, and cumulative distributions. Graphical methods for presenting numerical data include histograms, line graphs, and scatter plots. The overall goal is to properly organize and present data to facilitate analysis and decision making.

Uploaded by

susannaktikyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Chapter 2

This document discusses organizing and visualizing data. It covers constructing tables and charts for both categorical and numerical data. For categorical data, it describes summary tables, contingency tables, bar charts, and pie charts. For numerical data, it discusses ordered arrays, frequency distributions, relative frequency distributions, and cumulative distributions. Graphical methods for presenting numerical data include histograms, line graphs, and scatter plots. The overall goal is to properly organize and present data to facilitate analysis and decision making.

Uploaded by

susannaktikyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Visualizing and organizing data

Chapter 2
Chapter 2
Organizing and Visualizing Data

Learning Objectives

In this chapter, you learn:


• To construct tables and charts for numerical data
• To construct tables and charts for categorical data
• The principles of properly presenting graphs
Chapter 2
Organizing data

Data in raw form are usually not easy to use for decision making.
After defining variables and collecting data it’s time to organize the data to
help prepare for the later steps of visualizing and analyzing data. The
techniques of using and organizing data depend on the type of variable.

Organizing Categorical Data


The summary table: the summary table presents data as frequencies or
percentages for each category and helps to see the differences among the
categories.

Form of Payment Percentage (%) Fund Risk Level Number of Funds Percentage of Funds (%)
Cash 20 Below average 23 26.44%
Check 49 Average 34 39.08%
Online 23 Above average 30 34.48%
Other 8 Total 87 100.00%
Chapter 2
Organizing Categorical Data

A contingency table (cross tabs) allows to study patterns that may exist
between the responses of two or more categorical variables.

Contingency Table Displaying Type of Fund and Whether a Fee Is Charged

FEE
TYPE YES NO Total
Intermediate government 30 48 78
Short-term corporate 25 68 93
Total 55 116 171

Education

Incomplete Completed Secondary Incomplete Completed Post- Total


secondary secondary technical higher higher graduate
education education education education education degree

Count 384 1162 551 79 680 20 2876


Male % within
13.40% 40.40% 19.20% 2.70% 23.60% 0.70% 100.00%
gender
Gender
Count 475 1141 676 107 682 12 3093
Female % within
15.40% 36.90% 21.90% 3.50% 22.00% 0.40% 100.00%
gender
Count 859 2303 1227 186 1362 32 5969
Total % within
14.40% 38.60% 20.60% 3.10% 22.80% 0.50% 100.00%
gender
Chapter 2
Organizing Nuamerical Data

The Ordered Array: An ordered array arranges the values of a numerical


variable in rank order, from the smallest value to the largest value or vise versa.
– Shows range (min to max)
– Provides some signals about variability within the range
– May help identify outliers (unusual observations)
– If the data set is large, the ordered array is less useful

Data in raw form (as collected):

31 24 26 28 19 18 19 27 16

Data in ordered array from smallest to largest:

16 18 19 19 24 26 27 28 31
Chapter 2
Organizing Numerical Data

The Frequency Distribution: A frequency distribution summarizes


numerical values by matching them into a set of numerically ordered classes.
A frequency distribution is a list or a table …
- containing class groupings (ranges within which the data fall) ...
- and the corresponding frequencies with which data fall within each grouping or category
To create a useful frequency distribution, you must think about how many classes are appropriate for
your data and also determine a suitable width for each class interval.

Determining the class interval width


The beginning salary of a Business School graduate,
𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 AMD
𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑤𝑖𝑑𝑡ℎ =
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 Salary Range Frequency
Up to 250,600 25
From 250,600 to 300,200 18
From 300,200 to 349,800 20
From 349,800 to 399,400 14
From 399,400 to 449,000 18
Total 95
Chapter 2
Organizing Numerical Data

The Relative Frequency (proportion) Distribution.

The proportion or relative frequency is the number of values in each class divided by
the total number of values
The beginning salary of a Business School graduates,
AMD
Salary Range Frequency % in total
Up to 250,600 25 26%
From 250,600 to 300,200 18 19%
From 300,200 to 349,800 20 21%
From 349,800 to 399,400 14 15%
From 399,400 to 449,000 18 19%
Total 95 100%
Chapter 2
Organizing Numerical Data

The Cumulative Distribution.

The cumulative percentage distribution provides a way of presenting information


about the percentage of values that are less than a specific amount.

The beginning salary of a Business School graduates, AMD

Salary Range Frequency % in total Cumulative %


Up to 250,600 25 26% 26%
From 250,600 to 300,200 18 19% 45%
From 300,200 to 349,800 20 21% 66%
From 349,800 to 399,400 14 15% 81%
From 399,400 to 449,000 18 19% 100%
Total 95 100%
Chapter 2
Visualizing Data
Graphical presentation of Categorical Data

Bar Chart Pie Chart


Popularity of programming languages
(2018-2019)
Chapter 2
Visualizing Data
Graphical presentation of Categorical Data

The Side-by-Side Bar Chart:


A side-by-side bar chart
uses sets of bars to show
the joint responses from
two categorical variables.
Chapter 2
Visualizing Data
Graphical presentation of Categorical Data

Distribution of grades by gender


Chapter 2
Visualizing Data
Graphical presentation of Categorical Data

The same data from another perspective


Chapter 2
Visualizing Data
Graphical presentation of Categorical Data

Who is doing better, males or females ?


Chapter 2
Visualizing Data
Graphical presentation of Numerical Data

The Histogram: A histogram is a bar chart for


grouped numerical data in which you use
8 Age of graduate students
vertical bars to represent the frequencies or
percentages in each group. In a histogram,
6
Frequency

there are no gaps between adjacent bars.

4 The class boundaries (or class midpoints) are


shown on the horizontal axis
2
The vertical axis is either frequency or relative
frequency.
0
5 15 25 35 45 55 More So, the bars of the appropriate heights are used
to represent the number of observations within
each class
Chapter 2
Visualizing Data
Graphical presentation of Numerical Data

interval width = 50
Chapter 2
Visualizing Data
Graphical presentation of Numerical Data

interval width = 20
Chapter 2
Visualizing Data
Visualizing Two Numerical Variables

The Scatter Diagrams (Scatter


Plots) are used to examine
possible relationships between
two numerical variables

In Scatter Diagram:

– one variable is measured on


the vertical axis and the other
variable is measured on the
horizontal axis
Chapter 2
Visualizing Data
(Spurious correlation)
Chapter 2
Visualizing Data
Graphical presentation of time - series

700.00 The Time-Series Plot: A time-series plot

600.00 is used to study patterns in the values of

500.00
a variable over time and plots the values
of a numerical variable on the Y axis and
400.00
plots the time period associated with
300.00
each numerical value on the X axis.
200.00
2000-03-01
2000-08-01
2001-01-01
2001-06-01
2001-11-01
2002-04-01
2002-09-01
2003-02-01
2003-07-01
2003-12-01
2004-05-01
2004-10-01
2005-03-01
2005-08-01
2006-01-01
2006-06-01
2006-11-01
2007-04-01
2007-09-01
2008-02-01
2008-07-01
2008-12-01
2009-05-01
2009-10-01
2010-03-01
Exchange rate

You might also like