0% found this document useful (0 votes)

23 views

(English) Charts Are Like Pasta - Data Visualization Part 1 - Crash Course Statistics #5 (DownSub - Com)

Uploaded by

Francisco Lopez Galán

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

(English) Charts Are Like Pasta - Data Visualization Part 1 - Crash Course Statistics #5 (DownSub - Com)

Uploaded by

Francisco Lopez Galán

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 8

Hi, I’m Adriene Hill, and this is Crash

Course Statistics.

So, for the last few episodes we’ve discussed

ways to summarize data using numbers.

We used measures of central tendency and measures

of spread.

But sometimes it can be helpful to actually

*see* your data in addition to having numbers

to describe it.

Data visualizations are important to understand

because you’ll see them everyday.

In the news, on Facebook, in magazines.

Maybe I’ll make an infographic of all the

places we see data visualizations.

INTRO

There are two main types of data that we might

encounter: categorical and quantitative.

Quantitative data are quantities, numbers

that have both order and consistent spacing.

For example, how many ounces of olive oil

are in each American home.

If three families told you how many ounces

of olive oil they have, you could put them

in a meaningful order--from least to greatest,

or greatest to least.

This order also has consistent spacing, an

increase in 1 ounce of olive oil is the same

whether you go from 0 to 1 ounce, or from

100 to 101 ounces.

These properties allow us to do simple math

with the data--like taking the mean or calculating

the standard deviation.

Categorical data doesn’t have a meaningful

order or consistent spacing.

For example, favorite kind of pasta.

You might like penne, rotini, linguine, or

even Angel Hair, but there’s no objective

way to put those pastas into a meaningful

order.

Is penne truly better than linguine?

Where does rotini fit in?

It would be pasta madness to try to put them

in order.

The simplest way to display categorical data

is to make a frequency table.

A frequency table shows you all of the categories

and the number of data points that fall in

that category (in other words, its frequency).

To change a frequency table into a relative

frequency table, we just need to take each

raw frequency and divide by the number of

total points to get a decimal between 0 and 1.

Some of you may be used to reading decimals

as percentages, but if you’re not, just

multiply by 100 to get the percentage.

For linguine we have 10/50 which is 0.2 or

20% of the group.

Relative frequency tables have the benefit

of being easy to compare.

No matter what we’re measuring or how many

data points we have, it’s easy to compare

percentages.

If 20% of people like linguine, we can see

that’s a smaller percent than the 67% of

people who like pineapple on pizza or greater

than the 10% of my family who thinks statistics

are scary.

The relative frequency table for favorite

pasta might look like this.

We can also add more than one variable to

our frequency table.

We could ask people to rate their favorite

pasta sauce and make a combined frequency

table, or a contingency table, of both pasta

and sauce preference.
If I were planning a party, and needed to
pick some pasta for the group, my best bets

would be Rotini with Red Sauce and Penne with

Red or White sauce.

And because I’m planning a party and because

I’m having food, I did look it up: the chance

of death by choking on food in the US in a

given year is 1 in 100,686

But, sometimes we don’t want just numbers

in our visualization.

Earlier in the series, I talked about how it can be hard to wrap your head around
numbers--especially

when they get really big or really small.

There are other more visual ways to represent

categorical data.

One way to do this is with a bar chart.

A bar chart uses the frequencies that we saw

in our frequency table to create bars that

have a height equal to the frequency.

That way, we can compare the height of bars

instead of looking at raw numbers.

Here’s a bar chart representing the pasta

data we saw in our original frequency table.

You can see that penne is by far the most

chosen pasta, and how it compares to Angel Hair.

Bar charts display a lot of information in

a very simple graph, they can also display

the frequencies of multiple variables.

Let’s say we want to compare each of these

pasta types with either white or red sauce.

We can either stack frequencies so it gives

us the same information as our contingency

table, or we can have bar charts side by side.

Pie charts are another way of displaying categorical

data.

They use the relative frequency of categories

to portion out pieces of a Circle, just like

a pie.
The higher the relative frequency, the bigger
the slice of pie a category gets.

Pie charts are useful because our eyes are

pretty good at comparing slices.

Our pasta data in a pie chart looks like this.

Pie charts are great at visually displaying

one variable.

But they struggle to effectively display more

than one variable, like our pasta and sauces

contingency table.

Another way to display categorical data is

a pictograph.

Pictographs represent frequency with pictures.

A picture, like the ball in this basketball

participation graph, will represent some number

of units, say 100 kids.

So if Riverdale High had 550 students participate

in their basketball programs, then the graph

would show 5.5 basketballs.

Sometimes pictographs represent frequencies

by increasing the size of the picture instead

and it’s not wrong, but it’s more difficult

for us to visually compare, especially for

small differences, which can be misleading.

Plus, at a casual glance, we don’t know

what the size difference means.

Are we comparing the diameter of the basketballs?

Or are we comparing their areas?

*BREAKING NEWS*

This is Channel 2 News.

Looks like all you students out there are

really hitting the books!

Data from the US Department of Education shows

the graduation rate has been climbing!

So way to go everybody!
You’re passing the test of life with flying
colors!

Let’s push that stack of books even higher!

So, that last pictograph...not at all to scale.

See how the stacks of books are not proportionate?

It shows a difference of 5% (from 75% - 80%)

with a stack of books that is over *double*

the height of the 75% stack.

This makes the difference seem huge because

the axis doesn’t start at 0.

And yet, an increase of 80-81% is shown by

two stacks that are BARELY different in height,

even though the 5% difference looks huge.

Always keep on eye on those axes.

Let’s loop back to quantitative data, which

as you’ll remember, have a meaningful order

and consistent spacing.

Frequency tables can be used to display quantitative

data, like age, or height, or ounces of olive

oil in your house.

We just have to create categories out of our

quantitative data first.

We do that with a process called “binning”.

Binning takes a quantitative variable and

bins it into categories hthat are either pre-existing

or made up.

For example I can say that 0-15 oz of olive

oil is “Very Little”, 16-32 oz is “Average”,

33-49 oz is “A Lot” and 50+ oz is “Excessive”--like

suspiciously Excessive.

Like Will’s 14 cats excessive.

Why do you need so much olive oil?

Anyway, once I’ve binned my data, I can

create a frequency table or relative frequency

table, just like with our pasta example.

It might look something like this.

Binning is most useful when there’s pre-existing

“bins” for our data.

Like, you can divide age-in-years into the

bins “Child”, “Teen”, “Adult”

and “Older Adult” because those are pre-existing

categories.

We can also take a score on a depression test

and create two bins: “clinically depressed”

and “not clinically depressed”.

You can see from this example that bins don’t

HAVE to be equally spaced, but if you see

quantitative data that has been binned, make sure that the way it was divided up
was appropriate

for the situation.

Unequally spaced bins can be misleading unless

there’s a real world distinction to back

it up.

Say politician X wants to make himself look

popular, but it seems like people in their

30’s really hate him.

(probably because he said that the reason

they can’t afford a house is their brunch

habit).

Politician X wants to hide the fact that over

80% of people in their 30’s said they won’t

vote for him.

So he does some “re-binning”.

Traditionally the data are binned roughly

by decade 18 years old to 29 years old, 30

years old to 39 years old, 40 to 49...you

get the point.

But Mr. X needs to hide these hateful 30-somethings

in the data.

The old chart looked like this:

But Politician X decided to split up the 30-somethings

to make his numbers look better:
He moved the data around to hide the glaring
group of 30 year old dissenters.

Instead of showing the truth that 30-somethings

despise him, we see a more...positive view

of his popularity.

By splitting the 30-somethings and putting

some of them into two other, larger groups,

he can obscure their political dissatisfaction.

Looking at this new table, he’d win the

popularity vote in each of the 5 new bins.

If I don’t show you the number of voters

per bin, it seems legit...

Another categorical graphing method we can

apply to quantitative data is bar charts.

When we use bar charts for quantitative data,

we squish the bars together so that they’re

touching and we call them histograms.

The bars are squished together because the

data are ‘continuous’ which means the

values in one bar flow into the next bar,

there’s no separation like in our categorical

bar charts.

In histograms, like bar charts, the height

of the bars tell us how frequently data in

a certain range occur.

A histogram also gives us information about

how the data is distributed.

We can estimate where the mean, median and

mode of our data are as well as see how spread

out the data is.

Look at this histogram for our olive oil data.

For this histogram, we can see that the range

of the data is approximately 85 since it covers

value 0-85 ounces and that it’s right skewed

(the tail is to the right), and that it’s

center is around 25 ounces.

The histogram gives us more information about
the data than a frequency table does, but

they’re still obscuring WHAT the specific

data values are.

If you read the news--or watch the news--you

will see these representations over and over

and over.

You will likely see far more of these charts

and graph than you will create.

The big take away here, as a consumer of these

things, is to look closely at what the visualization

is actually telling you.

Or maybe trying to hide from you.

These charts and graphs give us another way

to comprehend numbers--to see the big picture.

Thanks for watching!

I’ll see you next week.

FRM Bionic Turtle T2-Quantitative
100% (2)
FRM Bionic Turtle T2-Quantitative
133 pages
MSA Training PPT 14-07-2020 PDF
100% (6)
MSA Training PPT 14-07-2020 PDF
125 pages
Business Report Project SMDM Sonali Pradhan
100% (1)
Business Report Project SMDM Sonali Pradhan
56 pages
Unit 2 Tutorials Data Representation and Distributions
No ratings yet
Unit 2 Tutorials Data Representation and Distributions
100 pages
Chapter 2: Organizing and Summarizing Data: This Work Is Licensed Under A Creative Commons License
No ratings yet
Chapter 2: Organizing and Summarizing Data: This Work Is Licensed Under A Creative Commons License
23 pages
Statistics Day 1a - Types of Data, Graphical Representation, Correlation, Data Modeling & Index Numbers
No ratings yet
Statistics Day 1a - Types of Data, Graphical Representation, Correlation, Data Modeling & Index Numbers
54 pages
STAT 206 Chapter 2 Notes
No ratings yet
STAT 206 Chapter 2 Notes
10 pages
07. Data Visualization
No ratings yet
07. Data Visualization
53 pages
Presentation of Data
No ratings yet
Presentation of Data
29 pages
CH 2 Notes Filled
No ratings yet
CH 2 Notes Filled
22 pages
Chapter 3 Data Presentation
No ratings yet
Chapter 3 Data Presentation
37 pages
Topic 2
No ratings yet
Topic 2
31 pages
Topic 2 TT1713
No ratings yet
Topic 2 TT1713
31 pages
2- Presenting Data Part
No ratings yet
2- Presenting Data Part
42 pages
Chapter 2. Presenting Data in Tables and Charts: Objectives
No ratings yet
Chapter 2. Presenting Data in Tables and Charts: Objectives
44 pages
Presentation of Data
No ratings yet
Presentation of Data
20 pages
Organizing-Data_250120_180858
No ratings yet
Organizing-Data_250120_180858
32 pages
2. presenting of data_١١١٠٥٩
No ratings yet
2. presenting of data_١١١٠٥٩
39 pages
Unit1 - 2charts and Graphs
No ratings yet
Unit1 - 2charts and Graphs
26 pages
Data Presentation
No ratings yet
Data Presentation
8 pages
Statistics-pages
No ratings yet
Statistics-pages
67 pages
Statistics
No ratings yet
Statistics
289 pages
2/ Organizing and Visualizing Variables: Dcova
No ratings yet
2/ Organizing and Visualizing Variables: Dcova
4 pages
Second Lecture in Elementary Statistics 101
No ratings yet
Second Lecture in Elementary Statistics 101
37 pages
Introductory Statistics (Chapter 2)
No ratings yet
Introductory Statistics (Chapter 2)
3 pages
Chapter2 MAS202
No ratings yet
Chapter2 MAS202
43 pages
Lecture 1, 2 and 3_d21432a1071b0bf181cd2be654ea33bb
No ratings yet
Lecture 1, 2 and 3_d21432a1071b0bf181cd2be654ea33bb
45 pages
Topic 5 Statistical Graphs
No ratings yet
Topic 5 Statistical Graphs
10 pages
ADDB - Week 1
No ratings yet
ADDB - Week 1
44 pages
Chapter 2
No ratings yet
Chapter 2
52 pages
Displaying & Organizing Data Statistics
No ratings yet
Displaying & Organizing Data Statistics
22 pages
Presentation of Data Stats CL 11
No ratings yet
Presentation of Data Stats CL 11
59 pages
Chapter 2 Updated
No ratings yet
Chapter 2 Updated
44 pages
Lecture 2 Stat-100
No ratings yet
Lecture 2 Stat-100
29 pages
Course: Biostatistics: Haramaya University, Chms
100% (1)
Course: Biostatistics: Haramaya University, Chms
49 pages
Data Visualization & Data Exploration - Unit II
No ratings yet
Data Visualization & Data Exploration - Unit II
26 pages
Share Report in Elementary Statistics and Probability - 1
No ratings yet
Share Report in Elementary Statistics and Probability - 1
72 pages
Educ 301.advanced Statistics - Abduljaleel Sumayan
No ratings yet
Educ 301.advanced Statistics - Abduljaleel Sumayan
103 pages
Data Arrangement and Presentation Formation of Tables and Charts
No ratings yet
Data Arrangement and Presentation Formation of Tables and Charts
55 pages
02 Descriptive Statisctics
No ratings yet
02 Descriptive Statisctics
59 pages
STATS.ECO – XI - 4
No ratings yet
STATS.ECO – XI - 4
7 pages
Data Organization: Seat Number: 22 Name: Reynald T. Gurion
No ratings yet
Data Organization: Seat Number: 22 Name: Reynald T. Gurion
30 pages
_
No ratings yet
_
26 pages
Lecture 6
No ratings yet
Lecture 6
27 pages
Chapter 2-190810 074149
No ratings yet
Chapter 2-190810 074149
19 pages
Chapter 2
No ratings yet
Chapter 2
95 pages
Statanalysis C2a
No ratings yet
Statanalysis C2a
6 pages
What Is Raw Data?
No ratings yet
What Is Raw Data?
8 pages
Introductory Statistics (Chapter 2)
No ratings yet
Introductory Statistics (Chapter 2)
3 pages
Session 3 - Data Presentation
No ratings yet
Session 3 - Data Presentation
24 pages
Data Visualization and Communication Introduction
No ratings yet
Data Visualization and Communication Introduction
14 pages
Biostatistics and Epidemiology LAB
No ratings yet
Biostatistics and Epidemiology LAB
13 pages
STATISTICS (Organizing Data)
No ratings yet
STATISTICS (Organizing Data)
6 pages
Picturing Distributions With Graphs
No ratings yet
Picturing Distributions With Graphs
21 pages
الثالثة
No ratings yet
الثالثة
16 pages
Chapter 2- Charts and Graphs
No ratings yet
Chapter 2- Charts and Graphs
24 pages
Msb11e PPT Ch02
No ratings yet
Msb11e PPT Ch02
33 pages
CL 4 - Presentation of Data Into Charts and Graphs
No ratings yet
CL 4 - Presentation of Data Into Charts and Graphs
14 pages
QMM 2
No ratings yet
QMM 2
68 pages
UNIT 5
No ratings yet
UNIT 5
33 pages
Session-4-5-6-Statistics For Data Analytics-Dr - Girish - Bagale - IsAGx5vCqq
No ratings yet
Session-4-5-6-Statistics For Data Analytics-Dr - Girish - Bagale - IsAGx5vCqq
21 pages
CHAPTER 2 Descriptive Statistics
No ratings yet
CHAPTER 2 Descriptive Statistics
5 pages
Beginner’s Guide to Correlation Analysis: Bite-Size Stats, #4
From Everand
Beginner’s Guide to Correlation Analysis: Bite-Size Stats, #4
Lee Baker
No ratings yet
1951 Theory of Elasticity - Timoshenko
No ratings yet
1951 Theory of Elasticity - Timoshenko
329 pages
Instant Download Basic Statistics in Business and Economics (ISE HED IRWIN STATISTICS) 10th Edition Douglas A. Lind PDF All Chapter
50% (2)
Instant Download Basic Statistics in Business and Economics (ISE HED IRWIN STATISTICS) 10th Edition Douglas A. Lind PDF All Chapter
64 pages
The Use of Ict by High School Students and The Impact On Academic Performance
No ratings yet
The Use of Ict by High School Students and The Impact On Academic Performance
11 pages
TLRHSOUTHEASTASIA-D-24-00786
No ratings yet
TLRHSOUTHEASTASIA-D-24-00786
34 pages
Change of Perception Based On Attractiveness
No ratings yet
Change of Perception Based On Attractiveness
26 pages
The Modified CAMDEX
No ratings yet
The Modified CAMDEX
10 pages
Analysis On ELT Book Blurbs
No ratings yet
Analysis On ELT Book Blurbs
21 pages
Thesis Chapter 4 Sample
100% (3)
Thesis Chapter 4 Sample
6 pages
DSO - Relationship No Leadership Sustan Successful Organisations
No ratings yet
DSO - Relationship No Leadership Sustan Successful Organisations
14 pages
Ardani Force Degradation Trend of Latex and Nonlatex Orth
No ratings yet
Ardani Force Degradation Trend of Latex and Nonlatex Orth
10 pages
Pro-Environmental Behavior: Does It Matter How It 'S Measured? Development and Validation of The Pro-Environmental Behavior Scale (PEBS)
No ratings yet
Pro-Environmental Behavior: Does It Matter How It 'S Measured? Development and Validation of The Pro-Environmental Behavior Scale (PEBS)
10 pages
Idea Lesson Exemplar 3 Is
No ratings yet
Idea Lesson Exemplar 3 Is
4 pages
(eBook PDF) Family Therapy: A Systemic Integration 8th Edition download
100% (1)
(eBook PDF) Family Therapy: A Systemic Integration 8th Edition download
58 pages
Perception of Ecological Literacy in Education: A Scale Development Study
No ratings yet
Perception of Ecological Literacy in Education: A Scale Development Study
8 pages
IJTK Supplementary
No ratings yet
IJTK Supplementary
9 pages
Difficulty Scaling of Game Ai
No ratings yet
Difficulty Scaling of Game Ai
5 pages
Module-4 Notes -Selection and Interview Strategy
No ratings yet
Module-4 Notes -Selection and Interview Strategy
35 pages
Office Performance Commitment and Review Form (Opcrf) Schools Division Office
No ratings yet
Office Performance Commitment and Review Form (Opcrf) Schools Division Office
13 pages
Sharma 2000 PDF
No ratings yet
Sharma 2000 PDF
36 pages
Benchmarking in Geotechnics-1 Part-I
No ratings yet
Benchmarking in Geotechnics-1 Part-I
26 pages
2023 Design of Single-Layer Color Echelle Grating Optical Waveguide For Augmented-Reality Display
No ratings yet
2023 Design of Single-Layer Color Echelle Grating Optical Waveguide For Augmented-Reality Display
16 pages
2502.14257v1
No ratings yet
2502.14257v1
15 pages
Prot SAP 000 PDF
No ratings yet
Prot SAP 000 PDF
28 pages
DCE 5731-Schedule Semester 117
No ratings yet
DCE 5731-Schedule Semester 117
3 pages
2010 - Improving Information Security Awareness and Behaviour Through Dialogue, Participation and Collective Reflection. An Intervention Study
No ratings yet
2010 - Improving Information Security Awareness and Behaviour Through Dialogue, Participation and Collective Reflection. An Intervention Study
14 pages
Tentative Exam Date Sheet Ba Programme
No ratings yet
Tentative Exam Date Sheet Ba Programme
17 pages
Initial Risk Assessment: Purpose
No ratings yet
Initial Risk Assessment: Purpose
4 pages

(English) Charts Are Like Pasta - Data Visualization Part 1 - Crash Course Statistics #5 (DownSub - Com)

Uploaded by

(English) Charts Are Like Pasta - Data Visualization Part 1 - Crash Course Statistics #5 (DownSub - Com)

Uploaded by

Hi, I’m Adriene Hill, and this is Crash

So, for the last few episodes we’ve discussed

We used measures of central tendency and measures

But sometimes it can be helpful to actually

Data visualizations are important to understand

In the news, on Facebook, in magazines.

Maybe I’ll make an infographic of all the

There are two main types of data that we might

Quantitative data are quantities, numbers

For example, how many ounces of olive oil

If three families told you how many ounces

in a meaningful order--from least to greatest,

This order also has consistent spacing, an

whether you go from 0 to 1 ounce, or from

These properties allow us to do simple math

the standard deviation.

Categorical data doesn’t have a meaningful

For example, favorite kind of pasta.

You might like penne, rotini, linguine, or

way to put those pastas into a meaningful

Is penne truly better than linguine?

Where does rotini fit in?

It would be pasta madness to try to put them

The simplest way to display categorical data

A frequency table shows you all of the categories

that category (in other words, its frequency).

To change a frequency table into a relative

raw frequency and divide by the number of

Some of you may be used to reading decimals

multiply by 100 to get the percentage.

For linguine we have 10/50 which is 0.2 or

Relative frequency tables have the benefit

No matter what we’re measuring or how many

If 20% of people like linguine, we can see

people who like pineapple on pizza or greater

The relative frequency table for favorite

We can also add more than one variable to

We could ask people to rate their favorite

table, or a contingency table, of both pasta

would be Rotini with Red Sauce and Penne with

And because I’m planning a party and because

of death by choking on food in the US in a

But, sometimes we don’t want just numbers

when they get really big or really small.

There are other more visual ways to represent

One way to do this is with a bar chart.

A bar chart uses the frequencies that we saw

have a height equal to the frequency.

That way, we can compare the height of bars

Here’s a bar chart representing the pasta

You can see that penne is by *far* the most

Bar charts display a lot of information in

the frequencies of multiple variables.

Let’s say we want to compare each of these

We can either stack frequencies so it gives

table, or we can have bar charts side by side.

Pie charts are another way of displaying categorical

They use the relative frequency of categories

Pie charts are useful because our eyes are

Our pasta data in a pie chart looks like this.

Pie charts are great at visually displaying

But they struggle to effectively display more

Another way to display categorical data is

Pictographs represent frequency with pictures.

A picture, like the ball in this basketball

of units, say 100 kids.

So if Riverdale High had 550 students participate

would show 5.5 basketballs.

Sometimes pictographs represent frequencies

and it’s not wrong, but it’s more difficult

small differences, which can be misleading.

Plus, at a casual glance, we don’t know

Are we comparing the diameter of the basketballs?

Or are we comparing their areas?

This is Channel 2 News.

Looks like all you students out there are

You can see that penne is by far the most